gitee.com/quant1x/gox@v1.21.2/text/uniseg/README.md (about) 1 # Unicode Text Segmentation for Go 2 3 [![Godoc Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/rivo/uniseg) 4 [![Go Report](https://img.shields.io/badge/go%20report-A%2B-brightgreen.svg)](https://goreportcard.com/report/github.com/rivo/uniseg) 5 6 This Go package implements Unicode Text Segmentation according 7 to [Unicode Standard Annex #29](http://unicode.org/reports/tr29/) (Unicode version 12.0.0). 8 9 At this point, only the determination of grapheme cluster boundaries is implemented. 10 11 ## Background 12 13 In Go, [strings are read-only slices of bytes](https://blog.golang.org/strings). They can be turned into Unicode code 14 points using the `for` loop or by casting: `[]rune(str)`. However, multiple code points may be combined into one 15 user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples: 16 17 | String | Bytes (UTF-8) | Code points (runes) | Grapheme clusters | 18 |--------|-------------------------------------------------------|----------------------------------------|---------------------------------------| 19 | Käse | 6 bytes: `4b 61 cc 88 73 65` | 5 code points: `4b 61 308 73 65` | 4 clusters: `[4b],[61 308],[73],[65]` | 20 | 🏳️🌈 | 14 bytes: `f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88` | 4 code points: `1f3f3 fe0f 200d 1f308` | 1 cluster: `[1f3f3 fe0f 200d 1f308]` | 21 | 🇩🇪 | 8 bytes: `f0 9f 87 a9 f0 9f 87 aa` | 2 code points: `1f1e9 1f1ea` | 1 cluster: `[1f1e9 1f1ea]` | 22 23 This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of 24 user-perceived characters, to split strings in their intended places, or to extract individual characters which form a 25 unit. 26 27 ## Installation 28 29 ```bash 30 go get github.com/rivo/uniseg 31 ``` 32 33 ## Basic Example 34 35 ```go 36 package uniseg 37 38 import ( 39 "fmt" 40 41 "github.com/rivo/uniseg" 42 ) 43 44 func main() { 45 gr := uniseg.NewGraphemes("👍🏼!") 46 for gr.Next() { 47 fmt.Printf("%x ", gr.Runes()) 48 } 49 // Output: [1f44d 1f3fc] [21] 50 } 51 ``` 52 53 ## Documentation 54 55 Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation. 56 57 ## Dependencies 58 59 This package does not depend on any packages outside the standard library. 60 61 ## Your Feedback 62 63 Add your issue here on GitHub. Feel free to get in touch if you have any questions. 64 65 ## Version 66 67 Version tags will be introduced once Golang modules are official. Consider this version 0.1.