github.com/hashicorp/hcl/v2@v2.20.0/guide/language_design.rst

github.com/hashicorp/hcl/v2@v2.20.0/guide/language_design.rst (about)

1 Configuration Language Design
2 =============================
3
4 In this section we will cover some conventions for HCL-based configuration
5 languages that can help make them feel consistent with other HCL-based
6 languages, and make the best use of HCL's building blocks.
7
8 HCL's native and JSON syntaxes both define a mapping from input bytes to a
9 higher-level information model. In designing a configuration language based on
10 HCL, your building blocks are the components in that information model:
11 blocks, arguments, and expressions.
12
13 Each calling application of HCL, then, effectively defines its own language.
14 Just as Atom and RSS are higher-level languages built on XML, HashiCorp
15 Terraform has a higher-level language built on HCL, while HashiCorp Nomad has
16 its own distinct language that is *also* built on HCL.
17
18 From an end-user perspective, these are distinct languages but have a common
19 underlying texture. Users of both are therefore likely to bring some
20 expectations from one to the other, and so this section is an attempt to
21 codify some of these shared expectations to reduce user surprise.
22
23 These are subjective guidelines however, and so applications may choose to
24 ignore them entirely or ignore them in certain specialized cases. An
25 application providing a configuration language for a pre-existing system, for
26 example, may choose to eschew the identifier naming conventions in this section
27 in order to exactly match the existing names in that underlying system.
28
29 Language Keywords and Identifiers
30 ---------------------------------
31
32 Much of the work in defining an HCL-based language is in selecting good names
33 for arguments, block types, variables, and functions.
34
35 The standard for naming in HCL is to use all-lowercase identifiers with
36 underscores separating words, like ``service`` or ``io_mode``. HCL identifiers
37 do allow uppercase letters and dashes, but this primarily for natural
38 interfacing with external systems that may have other identifier conventions,
39 and so these should generally be avoided for the identifiers native to your
40 own language.
41
42 The distinction between "keywords" and other identifiers is really just a
43 convention. In your own language documentation, you may use the word "keyword"
44 to refer to names that are presented as an intrinsic part of your language,
45 such as important top-level block type names.
46
47 Block type names are usually singular, since each block defines a single
48 object. Use a plural block name only if the block is serving only as a
49 namespacing container for a number of other objects. A block with a plural
50 type name will generally contain only nested blocks, and no arguments of its
51 own.
52
53 Argument names are also singular unless they expect a collection value, in
54 which case they should be plural. For example, ``name = "foo"`` but
55 ``subnet_ids = ["abc", "123"]``.
56
57 Function names will generally *not* use underscores and will instead just run
58 words together, as is common in the C standard library. This is a result of
59 the fact that several of the standard library functions offered in ``cty``
60 (covered in a later section) have names that follow C library function names
61 like ``substr``. This is not a strong rule, and applications that use longer
62 names may choose to use underscores for them to improve readability.
63
64 Blocks vs. Object Values
65 ------------------------
66
67 HCL blocks and argument values of object type have quite a similar appearance
68 in the native syntax, and are identical in JSON syntax:
69
70 .. code-block:: hcl
71
72 block {
73 foo = bar
74 }
75
76 # argument with object constructor expression
77 argument = {
78 foo = bar
79 }
80
81 In spite of this superficial similarity, there are some important differences
82 between these two forms.
83
84 The most significant difference is that a child block can contain nested blocks
85 of its own, while an object constructor expression can define only attributes
86 of the object it is creating.
87
88 The user-facing model for blocks is that they generally form the more "rigid"
89 structure of the language itself, while argument values can be more free-form.
90 An application will generally define in its schema and documentation all of
91 the arguments that are valid for a particular block type, while arguments
92 accepting object constructors are more appropriate for situations where the
93 arguments themselves are freely selected by the user, such as when the
94 expression will be converted by the application to a map type.
95
96 As a less contrived example, consider the ``resource`` block type in Terraform
97 and its use with a particular resource type ``aws_instance``:
98
99 .. code-block:: hcl
100
101 resource "aws_instance" "example" {
102 ami = "ami-abc123"
103 instance_type = "t2.micro"
104
105 tags = {
106 Name = "example instance"
107 }
108
109 ebs_block_device {
110 device_name = "hda1"
111 volume_size = 8
112 volume_type = "standard"
113 }
114 }
115
116 The top-level block type ``resource`` is fundamental to Terraform itself and
117 so an obvious candidate for block syntax: it maps directly onto an object in
118 Terraform's own domain model.
119
120 Within this block we see a mixture of arguments and nested blocks, all defined
121 as part of the schema of the ``aws_instance`` resource type. The ``tags``
122 map here is specified as an argument because its keys are free-form, chosen
123 by the user and mapped directly onto a map in the underlying system.
124 ``ebs_block_device`` is specified as a nested block, because it is a separate
125 domain object within the remote system and has a rigid schema of its own.
126
127 As a special case, block syntax may sometimes be used with free-form keys if
128 those keys each serve as a separate declaration of some first-class object
129 in the language. For example, Terraform has a top-level block type ``locals``
130 which behaves in this way:
131
132 .. code-block:: hcl
133
134 locals {
135 instance_type = "t2.micro"
136 instance_id = aws_instance.example.id
137 }
138
139 Although the argument names in this block are arbitrarily selected by the
140 user, each one defines a distinct top-level object. In other words, this
141 approach is used to create a more ergonomic syntax for defining these simple
142 single-expression objects, as a pragmatic alternative to more verbose and
143 redundant declarations using blocks:
144
145 .. code-block:: hcl
146
147 local "instance_type" {
148 value = "t2.micro"
149 }
150 local "instance_id" {
151 value = aws_instance.example.id
152 }
153
154 The distinction between domain objects, language constructs and user data will
155 always be subjective, so the final decision is up to you as the language
156 designer.
157
158 Standard Functions
159 ------------------
160
161 HCL itself does not define a common set of functions available in all HCL-based
162 languages; the built-in language operators give a baseline of functionality
163 that is always available, but applications are free to define functions as they
164 see fit.
165
166 With that said, there's a number of generally-useful functions that don't
167 belong to the domain of any one application: string manipulation, sequence
168 manipulation, date formatting, JSON serialization and parsing, etc.
169
170 Given the general need such functions serve, it's helpful if a similar set of
171 functions is available with compatible behavior across multiple HCL-based
172 languages, assuming the language is for an application where function calls
173 make sense at all.
174
175 The Go implementation of HCL is built on an underlying type and function system
176 :go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That
177 library also has a package of "standard library" functions which we encourage
178 applications to offer with consistent names and compatible behavior, either by
179 using the standard implementations directly or offering compatible
180 implementations under the same name.
181
182 The "standard" functions that new configuration formats should consider
183 offering are:
184
185 * ``abs(number)`` - returns the absolute (positive) value of the given number.
186 * ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear.
187 * ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order.
188 * ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments.
189 * ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``.
190 * ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type.
191 * ``int(number)`` - returns the integer component of the given number, rounding towards zero.
192 * ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value.
193 * ``jsonencode(val)`` - encodes the given value as a JSON string.
194 * ``length(coll)`` - returns the length of the given collection.
195 * ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules.
196 * ``max(numbers...)`` - returns the highest of the given number values.
197 * ``min(numbers...)`` - returns the lowest of the given number values.
198 * ``sethas(set, val)`` - returns true only if the given set has the given value as an element.
199 * ``setintersection(sets...)`` - returns the intersection of the given sets
200 * ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``.
201 * ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets.
202 * ``setunion(sets...)`` - returns the union of the given sets.
203 * ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters.
204 * ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters.
205 * ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp.
206 * ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules.
207
208 Not all of these functions will make sense in all applications. For example, an
209 application that doesn't use set types at all would have no reason to provide
210 the set-manipulation functions here.
211
212 Some languages will not provide functions at all, since they are primarily for
213 assigning values to arguments and thus do not need nor want any custom
214 computations of those values.
215
216 Block Results as Expression Variables
217 -------------------------------------
218
219 In some applications, top-level blocks serve also as declarations of variables
220 (or of attributes of object variables) available during expression evaluation,
221 as discussed in :ref:`go-interdep-blocks`.
222
223 In this case, it's most intuitive for the variables map in the evaluation
224 context to contain an value named after each valid top-level block
225 type and for these values to be object-typed or map-typed and reflect the
226 structure implied by block type labels.
227
228 For example, an application may have a top-level ``service`` block type
229 used like this:
230
231 .. code-block:: hcl
232
233 service "http" "web_proxy" {
234 listen_addr = "127.0.0.1:8080"
235
236 process "main" {
237 command = ["/usr/local/bin/awesome-app", "server"]
238 }
239
240 process "mgmt" {
241 command = ["/usr/local/bin/awesome-app", "mgmt"]
242 }
243 }
244
245 If the result of decoding this block were available for use in expressions
246 elsewhere in configuration, the above convention would call for it to be
247 available to expressions as an object at ``service.http.web_proxy``.
248
249 If it the contents of the block itself that are offered to evaluation -- or
250 a superset object *derived* from the block contents -- then the block arguments
251 can map directly to object attributes, but it is up to the application to
252 decide which value type is most appropriate for each block type, since this
253 depends on how multiple blocks of the same type relate to one another, or if
254 multiple blocks of that type are even allowed.
255
256 In the above example, an application would probably expose the ``listen_addr``
257 argument value as ``service.http.web_proxy.listen_addr``, and may choose to
258 expose the ``process`` blocks as a map of objects using the labels as keys,
259 which would allow an expression like
260 ``service.http.web_proxy.service["main"].command``.
261
262 If multiple blocks of a given type do not have a significant order relative to
263 one another, as seems to be the case with these ``process`` blocks,
264 representation as a map is often the most intuitive. If the ordering of the
265 blocks *is* significant then a list may be more appropriate, allowing the use
266 of HCL's "splat operators" for convenient access to child arguments. However,
267 there is no one-size-fits-all solution here and language designers must
268 instead consider the likely usage patterns of each value and select the
269 value representation that best accommodates those patterns.
270
271 Some applications may choose to offer variables with slightly different names
272 than the top-level blocks in order to allow for more concise references, such
273 as abbreviating ``service`` to ``svc`` in the above examples. This should be
274 done with care since it may make the relationship between the two less obvious,
275 but this may be a good tradeoff for names that are accessed frequently that
276 might otherwise hurt the readability of expressions they are embedded in.
277 Familiarity permits brevity.
278
279 Many applications will not make blocks results available for use in other
280 expressions at all, in which case they are free to select whichever variable
281 names make sense for what is being exposed. For example, a format may make
282 environment variable values available for use in expressions, and may do so
283 either as top-level variables (if no other variables are needed) or as an
284 object named ``env``, which can be used as in ``env.HOME``.
285
286 Text Editor and IDE Integrations
287 --------------------------------
288
289 Since HCL defines only low-level syntax, a text editor or IDE integration for
290 HCL itself can only really provide basic syntax highlighting.
291
292 For non-trivial HCL-based languages, a more specialized editor integration may
293 be warranted. For example, users writing configuration for HashiCorp Terraform
294 must recall the argument names for numerous different provider plugins, and so
295 auto-completion and documentation hovertips can be a great help, and
296 configurations are commonly spread over multiple files making "Go to Definition"
297 functionality useful. None of this functionality can be implemented generically
298 for all HCL-based languages since it relies on knowledge of the structure of
299 Terraform's own language.
300
301 Writing such text editor integrations is out of the scope of this guide. The
302 Go implementation of HCL does have some building blocks to help with this, but
303 it will always be an application-specific effort.
304
305 However, in order to *enable* such integrations, it is best to establish a
306 conventional file extension *other than* `.hcl` for each non-trivial HCL-based
307 language, thus allowing text editors to recognize it and enable the suitable
308 integration. For example, Terraform requires ``.tf`` and ``.tf.json`` filenames
309 for its main configuration, and the ``hcldec`` utility in the HCL repository
310 accepts spec files that should conventionally be named with an ``.hcldec``
311 extension.
312
313 For simple languages that are unlikely to benefit from specific editor
314 integrations, using the ``.hcl`` extension is fine and may cause an editor to
315 enable basic syntax highlighting, absent any other deeper features. An editor
316 extension for a specific HCL-based language should *not* match generically the
317 ``.hcl`` extension, since this can cause confusing results for users
318 attempting to write configuration files targeting other applications.