github.com/dmaizel/tests@v0.0.0-20210728163746-cae6a2d9cee8/cmd/check-spelling/README.md (about)

     1  # Spell check tool
     2  
     3  * [Overview](#overview)
     4  * [Approach](#approach)
     5  * [Custom words](#custom-words)
     6  * [Spell check a document file](#spell-check-a-document-file)
     7  * [Other options](#other-options)
     8  * [Technical details](#technical-details)
     9      * [Hunspell dictionary format](#hunspell-dictionary-format)
    10      * [Source files](#source-files)
    11          * [Word list file fragments](#word-list-file-fragments)
    12          * [Rules file](#rules-file)
    13  * [Adding a new word](#adding-a-new-word)
    14      * [Update the word list fragment](#update-the-word-list-fragment)
    15      * [Optionally update the rules file](#optionally-update-the-rules-file)
    16      * [Create the master dictionary files](#create-the-master-dictionary-files)
    17      * [Test the changes](#test-the-changes)
    18  
    19  ## Overview
    20  
    21  The `kata-spell-check.sh` tool is used to check a markdown file for
    22  typographical (spelling) mistakes.
    23  
    24  ## Approach
    25  
    26  The spell check tool is based on
    27  [`hunspell`](https://github.com/hunspell/hunspell). It uses standard Hunspell
    28  English dictionaries and supplements these with a custom Hunspell dictionary.
    29  The document is cleaned of several entities before the spell-check begins.
    30  These entities include the following:
    31  
    32  - URLs
    33  - Email addresses
    34  - Code blocks
    35  - Most punctuation
    36  - GitHub userids
    37  
    38  ## Custom words
    39  
    40  A custom dictionary is required to accept specific words that are either well
    41  understood by the community or are defined in various document files, but do
    42  not appear in standard dictionaries. The custom dictionaries allow those words
    43  to be accepted as correct. The following lists common examples of such words:
    44  
    45  - Abbreviations
    46  - Acronyms
    47  - Company names
    48  - Product names
    49  - Project names
    50  - Technical terms
    51  
    52  ## Spell check a document file
    53  
    54  ```sh
    55  $ ./kata-spell-check.sh check /path/to/file
    56  ```
    57  
    58  > **Note:** If you have made local edits to the dictionaries, you may 
    59  > [re-create the master dictionary files](#create-the-master-dictionary-files)
    60  > as documented in the [Adding a new word](#adding-a-new-word) section, 
    61  > in order for your local edits take effect.
    62  
    63  ## Other options
    64  
    65  Lists all available options and commands:
    66  
    67  ```sh
    68  $ ./kata-spell-check.sh -h
    69  ```
    70  
    71  ## Technical details
    72  
    73  ### Hunspell dictionary format
    74  
    75  A Hunspell dictionary comprises two text files:
    76  
    77  - A word list file
    78  
    79    This file defines a list of words (one per line). The list includes optional
    80    references to one or more rules defined in the rules file as well as optional
    81    comments. Specify fixed words (e.g. company names) verbatim. Enter “normal”
    82    words in their root form.
    83  
    84    The root form of a "normal" word is the simplest and shortest form of that
    85    word. For example, the following list of words are all formed from the root
    86    word "computer":
    87  
    88    - Computers
    89    - Computer’s
    90    - Computing
    91    - Computed
    92  
    93    Each word in the previous list is an example of using the word "computer" to
    94    construct said word through a combination of applying the following
    95    manipulations:
    96  
    97    - Remove one or more characters from the end of the word.
    98    - Add a new ending.
    99  
   100    Therefore, you list the root word "computer" in the word list file.
   101  
   102  - A rules file
   103  
   104    This file defines named manipulations to apply to root words to form new
   105    words. For example, rules that make a root word plural.
   106  
   107  ### Source files
   108  
   109  The rules file and the the word list file for the custom dictionary generate
   110  from "source" fragment files in the [`data`](data/) directory.
   111  
   112  All the fragment files allow comments using the hash (`#`) comment
   113  symbol and all files contain a comment header explaining their content.
   114  
   115  #### Word list file fragments
   116  
   117  The `*.txt` files are word list file fragments. Splitting the word list
   118  into fragments makes updates easier and clearer as each fragment is a
   119  grouping of related terms. The name of the file gives a clue as to the
   120  contents but the comments at the top of each file provide further
   121  detail.
   122  
   123  Every line that does not start with a comment symbol contains a single
   124  word. An optional comment for a word may appear after the word and is
   125  separated from the word by whitespace followed by the comment symbol:
   126  
   127  ```
   128  word		# This is a comment explaining this particular word list entry.
   129  ```
   130  
   131  You *may* suffix each word by a forward slash followed by one or more
   132  upper-case letters. Each letter refers to a rule name in the rules file:
   133  
   134  ```
   135  word/AC		# This word references the 'A' and 'C' rules.
   136  ```
   137  
   138  #### Rules file
   139  
   140  The [rules file](data/rules.aff) contains a set of general rules that can be
   141  applied to one or more root words in the word list files. You can make
   142  comments in the rules file.
   143  
   144  For an explanation of the format of this file see
   145  [`man 5 hunspell`](http://www.manpagez.com/man/5/hunspell)
   146  ([source](https://github.com/hunspell/hunspell/blob/master/man/hunspell.5)).
   147  
   148  ## Adding a new word
   149  
   150  ### Update the word list fragment
   151  
   152  If you want to allow a new word to the dictionary,
   153  
   154  - Check to ensure you do need to add the word
   155  
   156    Is the word valid and correct? If the word is a project, product,
   157    or company name, is the capitalization correct?
   158  
   159  - Add the new word to the appropriate [word list fragment file](data).
   160  
   161    Specifically, if it is a general word, add the *root* of the word to
   162    the appropriate fragment file.
   163  
   164  - Add a `/` suffix along with the letters for each rule to apply in order to
   165    add rules references.
   166  
   167  ### Optionally update the rules file
   168  
   169  It should not generally be necessary to update the rules file since it
   170  already contains rules for most scenarios. However, if you need to
   171  update the file, [read the documentation carefully](#rules-file).
   172  
   173  ### Create the master dictionary files
   174  
   175  Every time you change the dictionary files you must recreate the master
   176  dictionary files:
   177  
   178  ```sh
   179  $ ./kata-spell-check.sh make-dict
   180  ```
   181  
   182  As a convenience, [checking a file](#spell-check-a-document-file) will
   183  automatically create the database.
   184  
   185  ### Test the changes
   186  
   187  You must test any changes to the [word list file
   188  fragments](#word-list-file-fragments) or the [rules file](#rules-file)
   189  by doing the following:
   190  
   191  1. Recreate the [master dictionary files](#create-the-master-dictionary-files).
   192  
   193  1. [Run the spell checker](#spell-check-a-document-file) on a file containing the
   194     words you have added to the dictionary.