modernc.org/cc@v1.0.1/v2/testdata/_sqlite/ext/icu/README.txt (about)

     1  
     2  This directory contains source code for the SQLite "ICU" extension, an
     3  integration of the "International Components for Unicode" library with
     4  SQLite. Documentation follows.
     5  
     6      1. Features
     7      
     8          1.1  SQL Scalars upper() and lower()
     9          1.2  Unicode Aware LIKE Operator
    10          1.3  ICU Collation Sequences
    11          1.4  SQL REGEXP Operator
    12      
    13      2. Compilation and Usage
    14      
    15      3. Bugs, Problems and Security Issues
    16      
    17          3.1  The "case_sensitive_like" Pragma
    18          3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
    19          3.3  Collation Sequence Security Issue
    20  
    21  
    22  1. FEATURES
    23  
    24    1.1  SQL Scalars upper() and lower()
    25  
    26      SQLite's built-in implementations of these two functions only 
    27      provide case mapping for the 26 letters used in the English
    28      language. The ICU based functions provided by this extension
    29      provide case mapping, where defined, for the full range of 
    30      unicode characters.
    31  
    32      ICU provides two types of case mapping, "general" case mapping and
    33      "language specific". Refer to ICU documentation for the differences
    34      between the two. Specifically:
    35  
    36         http://www.icu-project.org/userguide/caseMappings.html
    37         http://www.icu-project.org/userguide/posix.html#case_mappings
    38  
    39      To utilise "general" case mapping, the upper() or lower() scalar 
    40      functions are invoked with one argument:
    41  
    42          upper('ABC') -> 'abc'
    43          lower('abc') -> 'ABC'
    44  
    45      To access ICU "language specific" case mapping, upper() or lower()
    46      should be invoked with two arguments. The second argument is the name
    47      of the locale to use. Passing an empty string ("") or SQL NULL value
    48      as the second argument is the same as invoking the 1 argument version
    49      of upper() or lower():
    50  
    51          lower('I', 'en_us') -> 'i'
    52          lower('I', 'tr_tr') -> 'ı' (small dotless i)
    53  
    54    1.2  Unicode Aware LIKE Operator
    55  
    56      Similarly to the upper() and lower() functions, the built-in SQLite LIKE
    57      operator understands case equivalence for the 26 letters of the English
    58      language alphabet. The implementation of LIKE included in this
    59      extension uses the ICU function u_foldCase() to provide case
    60      independent comparisons for the full range of unicode characters.  
    61  
    62      The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
    63      dotless 'I' character used in the Turkish language is considered
    64      to be in the same equivalence class as the dotted 'I' character
    65      used by many languages (including English).
    66  
    67    1.3  ICU Collation Sequences
    68  
    69      A special SQL scalar function, icu_load_collation() is provided that 
    70      may be used to register ICU collation sequences with SQLite. It
    71      is always called with exactly two arguments, the ICU locale 
    72      identifying the collation sequence to ICU, and the name of the
    73      SQLite collation sequence to create. For example, to create an
    74      SQLite collation sequence named "turkish" using Turkish language
    75      sorting rules, the SQL statement:
    76  
    77          SELECT icu_load_collation('tr_TR', 'turkish');
    78  
    79      Or, for Australian English:
    80  
    81          SELECT icu_load_collation('en_AU', 'australian');
    82  
    83      The identifiers "turkish" and "australian" may then be used
    84      as collation sequence identifiers in SQL statements:
    85  
    86          CREATE TABLE aust_turkish_penpals(
    87            australian_penpal_name TEXT COLLATE australian,
    88            turkish_penpal_name    TEXT COLLATE turkish
    89          );
    90    
    91    1.4 SQL REGEXP Operator
    92  
    93      This extension provides an implementation of the SQL binary
    94      comparision operator "REGEXP", based on the regular expression functions
    95      provided by the ICU library. The syntax of the operator is as described
    96      in SQLite documentation:
    97  
    98          <string> REGEXP <re-pattern>
    99  
   100      This extension uses the ICU defaults for regular expression matching
   101      behavior. Specifically, this means that:
   102  
   103          * Matching is case-sensitive,
   104          * Regular expression comments are not allowed within patterns, and
   105          * The '^' and '$' characters match the beginning and end of the
   106            <string> argument, not the beginning and end of lines within
   107            the <string> argument.
   108  
   109      Even more specifically, the value passed to the "flags" parameter
   110      of ICU C function uregex_open() is 0.
   111  
   112  
   113  2  COMPILATION AND USAGE
   114  
   115    The easiest way to compile and use the ICU extension is to build
   116    and use it as a dynamically loadable SQLite extension. To do this
   117    using gcc on *nix:
   118  
   119      gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
   120  
   121    You may need to add "-I" flags so that gcc can find sqlite3ext.h
   122    and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
   123    loaded into sqlite in the same way as any other dynamically loadable
   124    extension.
   125  
   126  
   127  3 BUGS, PROBLEMS AND SECURITY ISSUES
   128  
   129    3.1 The "case_sensitive_like" Pragma
   130  
   131      This extension does not work well with the "case_sensitive_like"
   132      pragma. If this pragma is used before the ICU extension is loaded,
   133      then the pragma has no effect. If the pragma is used after the ICU
   134      extension is loaded, then SQLite ignores the ICU implementation and
   135      always uses the built-in LIKE operator.
   136  
   137      The ICU extension LIKE operator is always case insensitive.
   138  
   139    3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
   140  
   141      Passing very long patterns to the built-in SQLite LIKE operator can
   142      cause excessive CPU usage. To curb this problem, SQLite defines the
   143      SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
   144      pattern in bytes (irrespective of encoding). The default value is
   145      defined in internal header file "limits.h".
   146      
   147      The ICU extension LIKE implementation suffers from the same 
   148      problem and uses the same solution. However, since the ICU extension
   149      code does not include the SQLite file "limits.h", modifying
   150      the default value therein does not affect the ICU extension.
   151      The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
   152      the ICU extension LIKE operator is 50000, defined in source 
   153      file "icu.c".
   154  
   155    3.3 Collation Sequence Security Issue
   156  
   157      Internally, SQLite assumes that indices stored in database files
   158      are sorted according to the collation sequence indicated by the
   159      SQL schema. Changing the definition of a collation sequence after
   160      an index has been built is therefore equivalent to database
   161      corruption. The SQLite library is not very well tested under
   162      these conditions, and may contain potential buffer overruns
   163      or other programming errors that could be exploited by a malicious
   164      programmer.
   165  
   166      If the ICU extension is used in an environment where potentially
   167      malicious users may execute arbitrary SQL (i.e. gears), they
   168      should be prevented from invoking the icu_load_collation() function,
   169      possibly using the authorisation callback.