gitee.com/quant1x/gox@v1.21.2/text/uniseg/README.md (about)

     1  # Unicode Text Segmentation for Go
     2  
     3  [![Godoc Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/rivo/uniseg)
     4  [![Go Report](https://img.shields.io/badge/go%20report-A%2B-brightgreen.svg)](https://goreportcard.com/report/github.com/rivo/uniseg)
     5  
     6  This Go package implements Unicode Text Segmentation according
     7  to [Unicode Standard Annex #29](http://unicode.org/reports/tr29/) (Unicode version 12.0.0).
     8  
     9  At this point, only the determination of grapheme cluster boundaries is implemented.
    10  
    11  ## Background
    12  
    13  In Go, [strings are read-only slices of bytes](https://blog.golang.org/strings). They can be turned into Unicode code
    14  points using the `for` loop or by casting: `[]rune(str)`. However, multiple code points may be combined into one
    15  user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples:
    16  
    17  | String | Bytes (UTF-8)                                         | Code points (runes)                    | Grapheme clusters                     |
    18  |--------|-------------------------------------------------------|----------------------------------------|---------------------------------------|
    19  | Käse  | 6 bytes: `4b 61 cc 88 73 65`                          | 5 code points: `4b 61 308 73 65`       | 4 clusters: `[4b],[61 308],[73],[65]` |
    20  | 🏳️‍🌈 | 14 bytes: `f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88` | 4 code points: `1f3f3 fe0f 200d 1f308` | 1 cluster: `[1f3f3 fe0f 200d 1f308]`  |
    21  | 🇩🇪   | 8 bytes: `f0 9f 87 a9 f0 9f 87 aa`                    | 2 code points: `1f1e9 1f1ea`           | 1 cluster: `[1f1e9 1f1ea]`            |
    22  
    23  This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of
    24  user-perceived characters, to split strings in their intended places, or to extract individual characters which form a
    25  unit.
    26  
    27  ## Installation
    28  
    29  ```bash
    30  go get github.com/rivo/uniseg
    31  ```
    32  
    33  ## Basic Example
    34  
    35  ```go
    36  package uniseg
    37  
    38  import (
    39  	"fmt"
    40  
    41  	"github.com/rivo/uniseg"
    42  )
    43  
    44  func main() {
    45  	gr := uniseg.NewGraphemes("👍🏼!")
    46  	for gr.Next() {
    47  		fmt.Printf("%x ", gr.Runes())
    48  	}
    49  	// Output: [1f44d 1f3fc] [21]
    50  }
    51  ```
    52  
    53  ## Documentation
    54  
    55  Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation.
    56  
    57  ## Dependencies
    58  
    59  This package does not depend on any packages outside the standard library.
    60  
    61  ## Your Feedback
    62  
    63  Add your issue here on GitHub. Feel free to get in touch if you have any questions.
    64  
    65  ## Version
    66  
    67  Version tags will be introduced once Golang modules are official. Consider this version 0.1.