github.com/jonsyu1/godel@v0.0.0-20171017211503-64567a0cf169/docs/Name.md (about)

     1  Name
     2  ====
     3  gödel's name is an homage to [Kurt Gödel](https://en.wikipedia.org/wiki/Kurt_G%C3%B6del). As with many Go tools, it is a
     4  play on words that involves "go". It is also a play on [Gödel's incompleteness theorems](https://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems),
     5  with the idea being that the standard Go tools provide a consistent system for Go projects, but there are truths that
     6  cannot be proven to be true using just the standard Go tooling itself. gödel acts as a tool outside of this system.
     7  
     8  Usage of name
     9  -------------
    10  The name of this project is gödel. The 'g' is lowercase, and the second letter is the precomposed representation of an
    11  'o' with a diaeresis (`ö`, `U+00F6`). This representation is used in the documentation and in the source code.
    12  
    13  However, for any file that can be written to disk, "godel" is used as the name instead. This is done to preserve maximal
    14  compatibility with all terminals and for filesystems that do not fully support Unicode names. The GitHub project name is
    15  also "godel" because GitHub only supports the character class `[A-Za-z0-9_.-]` for project names.
    16  
    17  Sidebar: HFS+ and decomposed unicode
    18  ------------------------------------
    19  At the beginning of the project, an attempt was made to use "gödel" as the canonical name for everything (including
    20  files and settings on-disk). Unfortunately, this was complicated by the fact that HFS+ (which is the default file system
    21  used by most MacOS systems) normalizes all Unicode names using Normalization Form D (NFD).
    22  
    23  Quick background: Unicode has a notion of [precomposed or decomposed characters](https://en.wikipedia.org/wiki/Precomposed_character).
    24  Unicode supports representing the `ö` character in two different ways: a precomposed form (`ö`, `U+00F6`) and a
    25  decomposed form ('o' + '¨', `U+006F` + `U+0308`). Unicode does define a notion of
    26  [equivalence](https://en.wikipedia.org/wiki/Unicode_equivalence#Combining_and_precomposed_characters), and Unicode
    27  equivalence recognizes these two forms as equivalent. However, by default, most calls compare strings strictly as byte
    28  sequences, in which case `ö` and `o¨` are distinct.
    29  
    30  Most file systems write file names exactly as they are provided. However, this is not the case for HFS+: HFS+ explicitly
    31  normalizes all file names using [Normalization Form D](http://unicode.org/reports/tr15/#Norm_Forms), which performs
    32  "canonical decomposition" on all inputs.
    33  
    34  Thus, when a file named `ö` (`U+00F6`) is written to disk, HFS+ translates it into `o¨` (`U+006F` + `U+0308`). The
    35  system-level calls handles translation, so a request to open `ö` is translated into a request for `o¨`. However, from a
    36  data perspective, this means that reads and writes can be asymmetric: requests for files with precomposed Unicode
    37  characters will return names that contain decomposed characters.
    38  
    39  This poses problems when trying to be interoperable with other systems that do not have this restriction. For example,
    40  in most file systems it is perfectly legal to have one file named `ö` and another named `o¨`. However, this is simply
    41  not possible to represent in HFS+. This is similar to the case insensitivity restriction -- by default, the HFS+ file
    42  system is case-insensitive, meaning that it is not possible to have files in a directory that differ only based on the
    43  case of the characters in the names (however, it is possible to format an HFS+ volume to be case-sensitive, which is
    44  recommended for most developers). These are known quirks of HFS+ that cause numerous headaches (some of which are
    45  enumerated in an oft-cited [rant by Linus Torvalds](https://plus.google.com/+JunioCHamano/posts/1Bpaj3e3Rru)). Luckily,
    46  it appears that this issue has been fixed correctly in the [Apple File System (APFS)](https://en.wikipedia.org/wiki/Apple_File_System).
    47  
    48  Initially, an attempt was made to normalize the names in gödel code to deal with this. In Go, this can be done by
    49  importing the `golang.org/x/text/unicode/norm` package and using `norm.NFC.String` to convert Unicode strings into
    50  NFC-normalized format. This would ensure that any instances of `go¨del` would be converted to `gödel`.
    51  
    52  In Bash scripts, the following function was used to return `go¨del` on Darwin systems and `gödel` on all other systems:
    53  
    54  ```
    55  normalize() {
    56      if [[ "$OSTYPE" == "darwin"* ]]; then
    57          # HFS file systems use deconstructed UTF-8
    58          echo $(iconv -t UTF8-MAC <<<$1)
    59      else
    60          echo $1
    61      fi
    62  }
    63  ```
    64  
    65  Although these work-arounds functioned correctly, in writing them it was clear that this approach would not be
    66  sustainable. It isn't realistic to require all gödel users to know about Unicode normalization and deal with it in their
    67  own tooling, and there seemed to be glitches in every new external system that had to interact with these characters
    68  (for example, URL-encoding of the characters was also handled differently by different systems when uploading artifacts
    69  that contained the name).