github.com/cycloidio/terraform@v1.1.10-0.20220513142504-76d5c768dc63/docs/unicode.md (about) 1 # How Terraform Uses Unicode 2 3 The Terraform language uses the Unicode standards as the basis of various 4 different features. The Unicode Consortium publishes new versions of those 5 standards periodically, and we aim to adopt those new versions in new 6 minor releases of Terraform in order to support additional characters added 7 in those new versions. 8 9 Unfortunately due to those features being implemented by relying on a number 10 of external libraries, adopting a new version of Unicode is not as simple as 11 just updating a version number somewhere. This document aims to describe the 12 various steps required to adopt a new version of Unicode in Terraform. 13 14 We typically aim to be consistent across all of these dependencies as to which 15 major version of Unicode we currently conform to. The usual initial driver 16 for a Unicode upgrade is switching to new version of the Go runtime library 17 which itself uses a new version of Unicode, because Go itself does not provide 18 any way to select Unicode versions independently from Go versions. Therefore 19 we typically upgrade to a new Unicode version only in conjunction with 20 upgrading to a new Go version. 21 22 ## Unicode tables in the Go standard library 23 24 Several Terraform language features are implemented in terms of functions in 25 [the Go `strings` package](https://pkg.go.dev/strings), 26 [the Go `unicode` package](https://pkg.go.dev/unicode), and other supporting 27 packages in the Go standard library. 28 29 The Go team maintains the Go standard library features to support a particular 30 Unicode version for each Go version. The specific Unicode version for a 31 particular Go version is available in 32 [`unicode.Version`](https://pkg.go.dev/unicode#Version). 33 34 We adopt a new version of Go by editing the `.go-version` file in the root 35 of this repository. Although it's typically possible to build Terraform with 36 other versions of Go, that file documents the version we intend to use for 37 official releases and thus the primary version we use for development and 38 testing. Adopting a new Go version typically also implies other behavior 39 changes inherited from the Go standard library, so it's important to review the 40 relevant version changelog(s) to note any behavior changes we'll need to pass 41 on to our own users via the Terraform changelog. 42 43 The other subsystems described below should always be set up to match 44 `unicode.Version`. In some cases those libraries automatically try to align 45 themselves with `unicode.Version` and generate an error if they cannot, but 46 that isn't true of all of them. 47 48 ## Unicode Text Segmentation 49 50 _Text Segmentation_ (TR29) is a Unicode standards annex which describes 51 algorithms for breaking strings into smaller units such as sentences, words, 52 and grapheme clusters. 53 54 Several Terraform language features make use of the _grapheme cluster_ 55 algorithm in particular, because it provides a practical definition of 56 individual visible characters, taking into account combining sequences such 57 as Latin letters with separate diacritics or Emoji characters with gender 58 presentation and skin tone modifiers. 59 60 The text segmentation algorithms rely on supplementary data tables that are 61 not part of the core set encoded in the Go standard library's `unicode` 62 packages, and so instead we rely on the third-party module 63 [`github.com/apparentlymart/go-textseg`](http://pkg.go.dev/github.com/apparentlymart/go-textseg) 64 to provide those tables and a Go implementation of the grapheme cluster 65 segmentation algorithm in terms of the tables. 66 67 The `go-textseg` library is designed to allow calling programs to potentially 68 support multiple Unicode versions at once, by offering a separate module major 69 version for each Unicode major version. For example, the full module path for 70 the Unicode 13 implementation is `github.com/apparentlymart/go-textseg/v13`. 71 72 If that external library doesn't yet have support for the Unicode version we 73 intend to adopt then we'll first need to open a pull request to contribute 74 new language support. The details of how to do this will unfortunately vary 75 depending on how significantly the Text Segmentation annex has changed since 76 the most recently-supported Unicode version, but in many cases it can be 77 just a matter of editing that library's `make_tables.go`, `make_test_tables.go`, 78 and `generate.go` files to point to the URLs where the Unicode consortium 79 published new tables and then run `go generate` to rebuild the files derived 80 from those data sources. As long as the new Unicode version has only changed 81 the data tables and not also changed the algorithm, often no further changes 82 are needed. 83 84 Once a new Unicode version is included, the maintainer of that library will 85 typically publish a new major version that we can depend on. Two different 86 codebases included in Terraform all depend directly on the `go-textseg` module 87 for parts of their functionality: 88 89 * [`hashicorp/hcl`](https://github.com/hashicorp/hcl) uses text 90 segmentation as part of producing visual column offsets in source ranges 91 returned by the tokenizer and parser. Terraform in turn uses that library 92 for the underlying syntax of the Terraform language, and so it passes on 93 those source ranges to the end-user as part of diagnostic messages. 94 * The third-party module [`github.com/zclconf/go-cty`](https://github.com/zclconf/go-cty) 95 provides several of the Terraform language built in functions, including 96 functions like `substr` and `length` which need to count grapheme clusters 97 as part of their implementation. 98 99 As part of upgrading Terraform's Unicode support we therefore typically also 100 open pull requests against these other codebases, and then adopt the new 101 versions that produces. Terraform work often drives the adoption of new Unicode 102 versions in those codebases, with other dependencies following along when they 103 next upgrade. 104 105 At the time of writing Terraform itself doesn't _directly_ depend on 106 `go-textseg`, and so there are no specific changes required in this Terraform 107 codebase aside from the `go.sum` file update that always follows from 108 changes to transitive dependencies. 109 110 The `go-textseg` library does have a different "auto-version" mechanism which 111 selects an appropriate module version based on the current Go language version, 112 but neither HCL nor cty use that because the auto-version package will not 113 compile for any Go version that doesn't have a corresponding Unicode version 114 explicitly recorded in that repository, and so that would be too harsh a 115 constraint for libraries like HCL which have many callers, many of which don't 116 care strongly about Unicode support, that may wish to upgrade Go before the 117 text segmentation library has been updated.