github.com/dmaizel/tests@v0.0.0-20210728163746-cae6a2d9cee8/cmd/check-spelling/README.md (about) 1 # Spell check tool 2 3 * [Overview](#overview) 4 * [Approach](#approach) 5 * [Custom words](#custom-words) 6 * [Spell check a document file](#spell-check-a-document-file) 7 * [Other options](#other-options) 8 * [Technical details](#technical-details) 9 * [Hunspell dictionary format](#hunspell-dictionary-format) 10 * [Source files](#source-files) 11 * [Word list file fragments](#word-list-file-fragments) 12 * [Rules file](#rules-file) 13 * [Adding a new word](#adding-a-new-word) 14 * [Update the word list fragment](#update-the-word-list-fragment) 15 * [Optionally update the rules file](#optionally-update-the-rules-file) 16 * [Create the master dictionary files](#create-the-master-dictionary-files) 17 * [Test the changes](#test-the-changes) 18 19 ## Overview 20 21 The `kata-spell-check.sh` tool is used to check a markdown file for 22 typographical (spelling) mistakes. 23 24 ## Approach 25 26 The spell check tool is based on 27 [`hunspell`](https://github.com/hunspell/hunspell). It uses standard Hunspell 28 English dictionaries and supplements these with a custom Hunspell dictionary. 29 The document is cleaned of several entities before the spell-check begins. 30 These entities include the following: 31 32 - URLs 33 - Email addresses 34 - Code blocks 35 - Most punctuation 36 - GitHub userids 37 38 ## Custom words 39 40 A custom dictionary is required to accept specific words that are either well 41 understood by the community or are defined in various document files, but do 42 not appear in standard dictionaries. The custom dictionaries allow those words 43 to be accepted as correct. The following lists common examples of such words: 44 45 - Abbreviations 46 - Acronyms 47 - Company names 48 - Product names 49 - Project names 50 - Technical terms 51 52 ## Spell check a document file 53 54 ```sh 55 $ ./kata-spell-check.sh check /path/to/file 56 ``` 57 58 > **Note:** If you have made local edits to the dictionaries, you may 59 > [re-create the master dictionary files](#create-the-master-dictionary-files) 60 > as documented in the [Adding a new word](#adding-a-new-word) section, 61 > in order for your local edits take effect. 62 63 ## Other options 64 65 Lists all available options and commands: 66 67 ```sh 68 $ ./kata-spell-check.sh -h 69 ``` 70 71 ## Technical details 72 73 ### Hunspell dictionary format 74 75 A Hunspell dictionary comprises two text files: 76 77 - A word list file 78 79 This file defines a list of words (one per line). The list includes optional 80 references to one or more rules defined in the rules file as well as optional 81 comments. Specify fixed words (e.g. company names) verbatim. Enter “normal” 82 words in their root form. 83 84 The root form of a "normal" word is the simplest and shortest form of that 85 word. For example, the following list of words are all formed from the root 86 word "computer": 87 88 - Computers 89 - Computer’s 90 - Computing 91 - Computed 92 93 Each word in the previous list is an example of using the word "computer" to 94 construct said word through a combination of applying the following 95 manipulations: 96 97 - Remove one or more characters from the end of the word. 98 - Add a new ending. 99 100 Therefore, you list the root word "computer" in the word list file. 101 102 - A rules file 103 104 This file defines named manipulations to apply to root words to form new 105 words. For example, rules that make a root word plural. 106 107 ### Source files 108 109 The rules file and the the word list file for the custom dictionary generate 110 from "source" fragment files in the [`data`](data/) directory. 111 112 All the fragment files allow comments using the hash (`#`) comment 113 symbol and all files contain a comment header explaining their content. 114 115 #### Word list file fragments 116 117 The `*.txt` files are word list file fragments. Splitting the word list 118 into fragments makes updates easier and clearer as each fragment is a 119 grouping of related terms. The name of the file gives a clue as to the 120 contents but the comments at the top of each file provide further 121 detail. 122 123 Every line that does not start with a comment symbol contains a single 124 word. An optional comment for a word may appear after the word and is 125 separated from the word by whitespace followed by the comment symbol: 126 127 ``` 128 word # This is a comment explaining this particular word list entry. 129 ``` 130 131 You *may* suffix each word by a forward slash followed by one or more 132 upper-case letters. Each letter refers to a rule name in the rules file: 133 134 ``` 135 word/AC # This word references the 'A' and 'C' rules. 136 ``` 137 138 #### Rules file 139 140 The [rules file](data/rules.aff) contains a set of general rules that can be 141 applied to one or more root words in the word list files. You can make 142 comments in the rules file. 143 144 For an explanation of the format of this file see 145 [`man 5 hunspell`](http://www.manpagez.com/man/5/hunspell) 146 ([source](https://github.com/hunspell/hunspell/blob/master/man/hunspell.5)). 147 148 ## Adding a new word 149 150 ### Update the word list fragment 151 152 If you want to allow a new word to the dictionary, 153 154 - Check to ensure you do need to add the word 155 156 Is the word valid and correct? If the word is a project, product, 157 or company name, is the capitalization correct? 158 159 - Add the new word to the appropriate [word list fragment file](data). 160 161 Specifically, if it is a general word, add the *root* of the word to 162 the appropriate fragment file. 163 164 - Add a `/` suffix along with the letters for each rule to apply in order to 165 add rules references. 166 167 ### Optionally update the rules file 168 169 It should not generally be necessary to update the rules file since it 170 already contains rules for most scenarios. However, if you need to 171 update the file, [read the documentation carefully](#rules-file). 172 173 ### Create the master dictionary files 174 175 Every time you change the dictionary files you must recreate the master 176 dictionary files: 177 178 ```sh 179 $ ./kata-spell-check.sh make-dict 180 ``` 181 182 As a convenience, [checking a file](#spell-check-a-document-file) will 183 automatically create the database. 184 185 ### Test the changes 186 187 You must test any changes to the [word list file 188 fragments](#word-list-file-fragments) or the [rules file](#rules-file) 189 by doing the following: 190 191 1. Recreate the [master dictionary files](#create-the-master-dictionary-files). 192 193 1. [Run the spell checker](#spell-check-a-document-file) on a file containing the 194 words you have added to the dictionary.