github.com/google/grumpy@v0.0.0-20171122020858-3ec87959189c/third_party/stdlib/re.py (about)

     1  #
     2  # Secret Labs' Regular Expression Engine
     3  #
     4  # re-compatible interface for the sre matching engine
     5  #
     6  # Copyright (c) 1998-2001 by Secret Labs AB.  All rights reserved.
     7  #
     8  # This version of the SRE library can be redistributed under CNRI's
     9  # Python 1.6 license.  For any other use, please contact Secret Labs
    10  # AB (info@pythonware.com).
    11  #
    12  # Portions of this engine have been developed in cooperation with
    13  # CNRI.  Hewlett-Packard provided funding for 1.6 integration and
    14  # other compatibility work.
    15  #
    16  
    17  r"""Support for regular expressions (RE).
    18  
    19  This module provides regular expression matching operations similar to
    20  those found in Perl.  It supports both 8-bit and Unicode strings; both
    21  the pattern and the strings being processed can contain null bytes and
    22  characters outside the US ASCII range.
    23  
    24  Regular expressions can contain both special and ordinary characters.
    25  Most ordinary characters, like "A", "a", or "0", are the simplest
    26  regular expressions; they simply match themselves.  You can
    27  concatenate ordinary characters, so last matches the string 'last'.
    28  
    29  The special characters are:
    30      "."      Matches any character except a newline.
    31      "^"      Matches the start of the string.
    32      "$"      Matches the end of the string or just before the newline at
    33               the end of the string.
    34      "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
    35               Greedy means that it will match as many repetitions as possible.
    36      "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
    37      "?"      Matches 0 or 1 (greedy) of the preceding RE.
    38      *?,+?,?? Non-greedy versions of the previous three special characters.
    39      {m,n}    Matches from m to n repetitions of the preceding RE.
    40      {m,n}?   Non-greedy version of the above.
    41      "\\"     Either escapes special characters or signals a special sequence.
    42      []       Indicates a set of characters.
    43               A "^" as the first character indicates a complementing set.
    44      "|"      A|B, creates an RE that will match either A or B.
    45      (...)    Matches the RE inside the parentheses.
    46               The contents can be retrieved or matched later in the string.
    47      (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below).
    48      (?:...)  Non-grouping version of regular parentheses.
    49      (?P<name>...) The substring matched by the group is accessible by name.
    50      (?P=name)     Matches the text matched earlier by the group named name.
    51      (?#...)  A comment; ignored.
    52      (?=...)  Matches if ... matches next, but doesn't consume the string.
    53      (?!...)  Matches if ... doesn't match next.
    54      (?<=...) Matches if preceded by ... (must be fixed length).
    55      (?<!...) Matches if not preceded by ... (must be fixed length).
    56      (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
    57                         the (optional) no pattern otherwise.
    58  
    59  The special sequences consist of "\\" and a character from the list
    60  below.  If the ordinary character is not on the list, then the
    61  resulting RE will match the second character.
    62      \number  Matches the contents of the group of the same number.
    63      \A       Matches only at the start of the string.
    64      \Z       Matches only at the end of the string.
    65      \b       Matches the empty string, but only at the start or end of a word.
    66      \B       Matches the empty string, but not at the start or end of a word.
    67      \d       Matches any decimal digit; equivalent to the set [0-9].
    68      \D       Matches any non-digit character; equivalent to the set [^0-9].
    69      \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v].
    70      \S       Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
    71      \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
    72               With LOCALE, it will match the set [0-9_] plus characters defined
    73               as letters for the current locale.
    74      \W       Matches the complement of \w.
    75      \\       Matches a literal backslash.
    76  
    77  This module exports the following functions:
    78      match    Match a regular expression pattern to the beginning of a string.
    79      search   Search a string for the presence of a pattern.
    80      sub      Substitute occurrences of a pattern found in a string.
    81      subn     Same as sub, but also return the number of substitutions made.
    82      split    Split a string by the occurrences of a pattern.
    83      findall  Find all occurrences of a pattern in a string.
    84      finditer Return an iterator yielding a match object for each match.
    85      compile  Compile a pattern into a RegexObject.
    86      purge    Clear the regular expression cache.
    87      escape   Backslash all non-alphanumerics in a string.
    88  
    89  Some of the functions in this module takes flags as optional parameters:
    90      I  IGNORECASE  Perform case-insensitive matching.
    91      L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
    92      M  MULTILINE   "^" matches the beginning of lines (after a newline)
    93                     as well as the string.
    94                     "$" matches the end of lines (before a newline) as well
    95                     as the end of the string.
    96      S  DOTALL      "." matches any character at all, including the newline.
    97      X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
    98      U  UNICODE     Make \w, \W, \b, \B, dependent on the Unicode locale.
    99  
   100  This module also defines an exception 'error'.
   101  
   102  """
   103  
   104  import sre_compile
   105  import sre_parse
   106  
   107  # try:
   108  #     import _locale
   109  # except ImportError:
   110  #     _locale = None
   111  _locale = None
   112  BRANCH = "branch"
   113  SUBPATTERN = "subpattern"
   114  
   115  # public symbols
   116  __all__ = [ "match", "search", "sub", "subn", "split", "findall",
   117      "compile", "purge", "template", "escape", "I", "L", "M", "S", "X",
   118      "U", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
   119      "UNICODE", "error" ]
   120  
   121  __version__ = "2.2.1"
   122  
   123  # flags
   124  I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
   125  L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
   126  U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
   127  M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
   128  S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
   129  X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
   130  
   131  # sre extensions (experimental, don't rely on these)
   132  T = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
   133  DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
   134  
   135  # sre exception
   136  error = sre_compile.error
   137  
   138  # --------------------------------------------------------------------
   139  # public interface
   140  
   141  def match(pattern, string, flags=0):
   142      """Try to apply the pattern at the start of the string, returning
   143      a match object, or None if no match was found."""
   144      return _compile(pattern, flags).match(string)
   145  
   146  def search(pattern, string, flags=0):
   147      """Scan through string looking for a match to the pattern, returning
   148      a match object, or None if no match was found."""
   149      return _compile(pattern, flags).search(string)
   150  
   151  def sub(pattern, repl, string, count=0, flags=0):
   152      """Return the string obtained by replacing the leftmost
   153      non-overlapping occurrences of the pattern in string by the
   154      replacement repl.  repl can be either a string or a callable;
   155      if a string, backslash escapes in it are processed.  If it is
   156      a callable, it's passed the match object and must return
   157      a replacement string to be used."""
   158      return _compile(pattern, flags).sub(repl, string, count)
   159  
   160  def subn(pattern, repl, string, count=0, flags=0):
   161      """Return a 2-tuple containing (new_string, number).
   162      new_string is the string obtained by replacing the leftmost
   163      non-overlapping occurrences of the pattern in the source
   164      string by the replacement repl.  number is the number of
   165      substitutions that were made. repl can be either a string or a
   166      callable; if a string, backslash escapes in it are processed.
   167      If it is a callable, it's passed the match object and must
   168      return a replacement string to be used."""
   169      return _compile(pattern, flags).subn(repl, string, count)
   170  
   171  def split(pattern, string, maxsplit=0, flags=0):
   172      """Split the source string by the occurrences of the pattern,
   173      returning a list containing the resulting substrings."""
   174      return _compile(pattern, flags).split(string, maxsplit)
   175  
   176  def findall(pattern, string, flags=0):
   177      """Return a list of all non-overlapping matches in the string.
   178  
   179      If one or more groups are present in the pattern, return a
   180      list of groups; this will be a list of tuples if the pattern
   181      has more than one group.
   182  
   183      Empty matches are included in the result."""
   184      return _compile(pattern, flags).findall(string)
   185  
   186      # if sys.hexversion >= 0x02020000:
   187      #     __all__.append("finditer")
   188      def finditer(pattern, string, flags=0):
   189          """Return an iterator over all non-overlapping matches in the
   190          string.  For each match, the iterator returns a match object.
   191  
   192          Empty matches are included in the result."""
   193          return _compile(pattern, flags).finditer(string)
   194  
   195  def compile(pattern, flags=0):
   196      "Compile a regular expression pattern, returning a pattern object."
   197      return _compile(pattern, flags)
   198  
   199  def purge():
   200      "Clear the regular expression cache"
   201      _cache.clear()
   202      _cache_repl.clear()
   203  
   204  def template(pattern, flags=0):
   205      "Compile a template pattern, returning a pattern object"
   206      return _compile(pattern, flags|T)
   207  
   208  _alphanum = frozenset(
   209      "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
   210  
   211  def escape(pattern):
   212      "Escape all non-alphanumeric characters in pattern."
   213      s = list(pattern)
   214      alphanum = _alphanum
   215      for i, c in enumerate(pattern):
   216          if c not in alphanum:
   217              if c == "\000":
   218                  s[i] = "\\000"
   219              else:
   220                  s[i] = "\\" + c
   221      return pattern[:0].join(s)
   222  
   223  # --------------------------------------------------------------------
   224  # internals
   225  
   226  _cache = {}
   227  _cache_repl = {}
   228  
   229  _pattern_type = type(sre_compile.compile("", 0))
   230  
   231  _MAXCACHE = 100
   232  
   233  def _compile(*key):
   234      # internal: compile pattern
   235      pattern, flags = key
   236      bypass_cache = flags & DEBUG
   237      if not bypass_cache:
   238          cachekey = (type(key[0]),) + key
   239          # try:
   240          #     p, loc = _cache[cachekey]
   241          #     if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
   242          #         return p
   243          # except KeyError:
   244          #     pass
   245      if isinstance(pattern, _pattern_type):
   246          if flags:
   247              raise ValueError('Cannot process flags argument with a compiled pattern')
   248          return pattern
   249      if not sre_compile.isstring(pattern):
   250          raise TypeError, "first argument must be string or compiled pattern"
   251      try:
   252          p = sre_compile.compile(pattern, flags)
   253      except error, v:
   254          raise error, v # invalid expression
   255      if not bypass_cache:
   256          if len(_cache) >= _MAXCACHE:
   257              _cache.clear()
   258          if p.flags & LOCALE:
   259              if not _locale:
   260                  return p
   261              # loc = _locale.setlocale(_locale.LC_CTYPE)
   262          else:
   263              loc = None
   264          _cache[cachekey] = p, loc
   265      return p
   266  
   267  def _compile_repl(*key):
   268      # internal: compile replacement pattern
   269      p = _cache_repl.get(key)
   270      if p is not None:
   271          return p
   272      repl, pattern = key
   273      try:
   274          p = sre_parse.parse_template(repl, pattern)
   275      except error, v:
   276          raise error, v # invalid expression
   277      if len(_cache_repl) >= _MAXCACHE:
   278          _cache_repl.clear()
   279      _cache_repl[key] = p
   280      return p
   281  
   282  def _expand(pattern, match, template):
   283      # internal: match.expand implementation hook
   284      template = sre_parse.parse_template(template, pattern)
   285      return sre_parse.expand_template(template, match)
   286  
   287  def _subx(pattern, template):
   288      # internal: pattern.sub/subn implementation helper
   289      template = _compile_repl(template, pattern)
   290      if not template[0] and len(template[1]) == 1:
   291          # literal replacement
   292          return template[1][0]
   293      def filter(match, template=template):
   294          return sre_parse.expand_template(template, match)
   295      return filter
   296  
   297  # register myself for pickling
   298  
   299  import copy_reg
   300  
   301  def _pickle(p):
   302      return _compile, (p.pattern, p.flags)
   303  
   304  copy_reg.pickle(_pattern_type, _pickle, _compile)
   305  
   306  # --------------------------------------------------------------------
   307  # experimental stuff (see python-dev discussions for details)
   308  
   309  class Scanner(object):
   310      def __init__(self, lexicon, flags=0):
   311          self.lexicon = lexicon
   312          # combine phrases into a compound pattern
   313          p = []
   314          s = sre_parse.Pattern()
   315          s.flags = flags
   316          for phrase, action in lexicon:
   317              p.append(sre_parse.SubPattern(s, [
   318                  (SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))),
   319                  ]))
   320          s.groups = len(p)+1
   321          p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
   322          self.scanner = sre_compile.compile(p)
   323      def scan(self, string):
   324          result = []
   325          append = result.append
   326          match = self.scanner.scanner(string).match
   327          i = 0
   328          while 1:
   329              m = match()
   330              if not m:
   331                  break
   332              j = m.end()
   333              if i == j:
   334                  break
   335              action = self.lexicon[m.lastindex-1][1]
   336              if hasattr(action, '__call__'):
   337                  self.match = m
   338                  action = action(self, m.group())
   339              if action is not None:
   340                  append(action)
   341              i = j
   342          return result, string[i:]