github.com/rusq/gomojimoji@v0.0.1/README.md (about)

     1  # (go) mojimoji
     2  
     3  [![Go Reference](https://pkg.go.dev/badge/github.com/rusq/gomojimoji.svg)](https://pkg.go.dev/github.com/rusq/gomojimoji)
     4  
     5  This is a port of the excellent [mojimoji][1] library written in Python
     6  to Golang.
     7  
     8  It provides two functions:
     9  - **HanToZen** - half-width to full-width character conversion.
    10  - **ZenToHan** - half-width to full-width character conversion.
    11  
    12  Each of the functions allow the following options:
    13  - **ASCII** - enable or disable ASCII translation.
    14  - **Digits** - enable or disable Digits translation.
    15  - **Kana** - enable or disable Kana translation.
    16  
    17  All options are enabled by default, see examples on their usage.
    18  
    19  Logic is implemented as of commit [aca2661][2].
    20  
    21  ## Examples
    22  
    23  ### HanToZen
    24  
    25  ```go
    26  fmt.Println(HanToZen("ニュージーランド"))
    27  fmt.Println(HanToZen("ニュージーランド Auckland 6012", ASCII(true), Digits(false), Kana(false)))
    28  
    29  // Output:
    30  // ニュージーランド
    31  // ニュージーランド Auckland 6012
    32  ```
    33  
    34  ### ZenToHan
    35  
    36  ```go
    37  fmt.Println(ZenToHan("ニュージーランド"))
    38  fmt.Println(ZenToHan("ニュージーランド Auckland 0123", Kana(false), Digits(true)))
    39  
    40  // Output:
    41  // ニュージーランド
    42  // ニュージーランド Auckland 0123
    43  ```
    44  
    45  ## Benchmark
    46  
    47  ### Original library etc.
    48  Original mojimoji, zenhan and unicodedata on my system, for comparison:
    49  ```python
    50  In [4]: s = u'ABCDEFG012345' * 10
    51  
    52  In [5]: %time for n in range(1000000): mojimoji.zen_to_han(s)
    53  CPU times: user 3.24 s, sys: 1.28 ms, total: 3.24 s
    54  Wall time: 3.24 s
    55  
    56  In [6]: %time for n in range(1000000): zenhan.z2h(s)
    57  CPU times: user 26.2 s, sys: 16.3 ms, total: 26.2 s
    58  Wall time: 26.2 s
    59  
    60  In [7]: %time for n in range(1000000): unicodedata.normalize('NFKC', s)
    61  CPU times: user 3.12 s, sys: 15.4 ms, total: 3.13 s
    62  Wall time: 3.14 s
    63  ```
    64  
    65  ### This library
    66  ZenToHan and HanToZen use different approaches:
    67  
    68  - ZenToHan uses string.Builder, which is simpler to implement.
    69  - HanToZen uses direct slice operations to allow for seeking when needed.
    70  
    71  ZenToHan:
    72  ```
    73  mojimoji (master)> go test -bench=BenchmarkZenToHanConv
    74  goos: darwin
    75  goarch: amd64
    76  pkg: github.com/rusq/gomojimoji
    77  cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
    78  BenchmarkZenToHanConv-16               1        2880823810 ns/op
    79  --- BENCH: BenchmarkZenToHanConv-16
    80      mojimoji_test.go:98: 2.88079814s
    81  PASS
    82  ok      github.com/rusq/gomojimoji      2.977s
    83  ```
    84  
    85  HanToZen:
    86  ```
    87  mojimoji (master)> go test -bench=BenchmarkHanToZen    
    88  goos: darwin
    89  goarch: amd64
    90  pkg: github.com/rusq/gomojimoji
    91  cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
    92  BenchmarkHanToZenConv-16               1        2712209539 ns/op
    93  --- BENCH: BenchmarkHanToZenConv-16
    94      mojimoji_test.go:107: 2.712166151s
    95  PASS
    96  ok      github.com/rusq/gomojimoji      2.804s
    97  ```
    98  
    99  [1]: https://github.com/studio-ousia/mojimoji
   100  [2]: https://github.com/studio-ousia/mojimoji/tree/aca26614f4a7a90a845f3a3c384c27d0a925efce