github.com/jonsyu1/godel@v0.0.0-20171017211503-64567a0cf169/docs/Name.md

github.com/jonsyu1/godel@v0.0.0-20171017211503-64567a0cf169/docs/Name.md (about)

1 Name
2 ====
3 gödel's name is an homage to [Kurt Gödel](https://en.wikipedia.org/wiki/Kurt_G%C3%B6del). As with many Go tools, it is a
4 play on words that involves "go". It is also a play on [Gödel's incompleteness theorems](https://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems),
5 with the idea being that the standard Go tools provide a consistent system for Go projects, but there are truths that
6 cannot be proven to be true using just the standard Go tooling itself. gödel acts as a tool outside of this system.
7
8 Usage of name
9 -------------
10 The name of this project is gödel. The 'g' is lowercase, and the second letter is the precomposed representation of an
11 'o' with a diaeresis (`ö`, `U+00F6`). This representation is used in the documentation and in the source code.
12
13 However, for any file that can be written to disk, "godel" is used as the name instead. This is done to preserve maximal
14 compatibility with all terminals and for filesystems that do not fully support Unicode names. The GitHub project name is
15 also "godel" because GitHub only supports the character class `[A-Za-z0-9_.-]` for project names.
16
17 Sidebar: HFS+ and decomposed unicode
18 ------------------------------------
19 At the beginning of the project, an attempt was made to use "gödel" as the canonical name for everything (including
20 files and settings on-disk). Unfortunately, this was complicated by the fact that HFS+ (which is the default file system
21 used by most MacOS systems) normalizes all Unicode names using Normalization Form D (NFD).
22
23 Quick background: Unicode has a notion of [precomposed or decomposed characters](https://en.wikipedia.org/wiki/Precomposed_character).
24 Unicode supports representing the `ö` character in two different ways: a precomposed form (`ö`, `U+00F6`) and a
25 decomposed form ('o' + '¨', `U+006F` + `U+0308`). Unicode does define a notion of
26 [equivalence](https://en.wikipedia.org/wiki/Unicode_equivalence#Combining_and_precomposed_characters), and Unicode
27 equivalence recognizes these two forms as equivalent. However, by default, most calls compare strings strictly as byte
28 sequences, in which case `ö` and `o¨` are distinct.
29
30 Most file systems write file names exactly as they are provided. However, this is not the case for HFS+: HFS+ explicitly
31 normalizes all file names using [Normalization Form D](http://unicode.org/reports/tr15/#Norm_Forms), which performs
32 "canonical decomposition" on all inputs.
33
34 Thus, when a file named `ö` (`U+00F6`) is written to disk, HFS+ translates it into `o¨` (`U+006F` + `U+0308`). The
35 system-level calls handles translation, so a request to open `ö` is translated into a request for `o¨`. However, from a
36 data perspective, this means that reads and writes can be asymmetric: requests for files with precomposed Unicode
37 characters will return names that contain decomposed characters.
38
39 This poses problems when trying to be interoperable with other systems that do not have this restriction. For example,
40 in most file systems it is perfectly legal to have one file named `ö` and another named `o¨`. However, this is simply
41 not possible to represent in HFS+. This is similar to the case insensitivity restriction -- by default, the HFS+ file
42 system is case-insensitive, meaning that it is not possible to have files in a directory that differ only based on the
43 case of the characters in the names (however, it is possible to format an HFS+ volume to be case-sensitive, which is
44 recommended for most developers). These are known quirks of HFS+ that cause numerous headaches (some of which are
45 enumerated in an oft-cited [rant by Linus Torvalds](https://plus.google.com/+JunioCHamano/posts/1Bpaj3e3Rru)). Luckily,
46 it appears that this issue has been fixed correctly in the [Apple File System (APFS)](https://en.wikipedia.org/wiki/Apple_File_System).
47
48 Initially, an attempt was made to normalize the names in gödel code to deal with this. In Go, this can be done by
49 importing the `golang.org/x/text/unicode/norm` package and using `norm.NFC.String` to convert Unicode strings into
50 NFC-normalized format. This would ensure that any instances of `go¨del` would be converted to `gödel`.
51
52 In Bash scripts, the following function was used to return `go¨del` on Darwin systems and `gödel` on all other systems:
53
54 ```
55 normalize() {
56 if [[ "$OSTYPE" == "darwin"* ]]; then
57 # HFS file systems use deconstructed UTF-8
58 echo $(iconv -t UTF8-MAC <<<$1)
59 else
60 echo $1
61 fi
62 }
63 ```
64
65 Although these work-arounds functioned correctly, in writing them it was clear that this approach would not be
66 sustainable. It isn't realistic to require all gödel users to know about Unicode normalization and deal with it in their
67 own tooling, and there seemed to be glitches in every new external system that had to interact with these characters
68 (for example, URL-encoding of the characters was also handled differently by different systems when uploading artifacts
69 that contained the name).