github.com/jhump/protocompile@v0.0.0-20221021153901-4f6f732835e8/grammar/README.md

github.com/jhump/protocompile@v0.0.0-20221021153901-4f6f732835e8/grammar/README.md (about)

     1  # NOTICE
     2  
     3  ### This repo is archived. For the latest content related to a comprehensive grammar and spec for the protobuf language, please see https://protobuf.com (the contents of which are also in [a git repo](https://github.com/bufbuild/protobuf.com)).
     4  
     5  ----
     6  
     7  # Introduction
     8  This is a specification for the syntax of the Protocol Buffer IDL
     9  (Interface Definition Language). Protocol Buffer is also known by the
    10  short hand "protobuf".
    11  
    12  The language is a platform-agnostic and implementation-language-agnostic
    13  way of describing data structures and RPC interfaces.
    14  
    15  This document is not an official artifact from Google or the Protobuf
    16  team. It has been developed over the course of implementing a [pure-Go
    17  compiler for Protobuf](https://pkg.go.dev/github.com/jhump/protoreflect@v1.10.1/desc/protoparse).
    18  There are official grammars that are available on the Protobuf developer
    19  website ([proto2](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec)
    20  and [proto3](https://developers.google.com/protocol-buffers/docs/reference/proto3-spec)).
    21  However, they are not thorough or entirely accurate and are insufficient
    22  for those interested in developing their own tools that can parse the
    23  language (such as alternate compilers, formatters, linters, or other
    24  static analyzers). This specification attempts to fill that role.
    25  
    26  This spec presents a unified grammar, capable of parsing both proto2
    27  and proto3 syntax files. The differences between the two do not
    28  impact the grammar and can be enforced as a post-process over the
    29  resulting parsed syntax tree.
    30  
    31  # Notation
    32  The syntax is specified using Extended Backus-Naur Form (EBNF):
    33  ```
    34  Production  = production_name "=" Expression "." .
    35  Expression  = Alternative { "|" Alternative } .
    36  Alternative = Term { Term } .
    37  Term        = production_name | literal [ "…" literal ] | Exclusion | Group | Option | Repetition .
    38  Exclusion   = "!" literal | "!" "(" literal { "|" literal_source } ")" .
    39  Group       = "(" Expression ")" .
    40  Option      = "[" Expression "]" .
    41  Repetition  = "{" Expression "}" .
    42  ```
    43  
    44  Productions are expressions constructed from terms and the following operators, in increasing precedence:
    45  
    46  * **|**:  Alternation
    47  * **!**:  Exclusion
    48  * **()**: Grouping
    49  * **[]**: Option (0 or 1 times)
    50  * **{}**: Repetition (0 to n times)
    51  
    52  Lower-case production names are used to identify lexical tokens. Non-terminals are in CamelCase.
    53  Literal source characters are enclosed in double quotes `""` or back quotes ``` `` ```.
    54  In double-quotes, the contents can encode otherwise non-printable characters. The
    55  backslash character (`\`) is used to mark these encoded sequences:
    56  
    57  * `"\n"`: The newline character (code point 10).
    58  * `"\r"`: The carriage return character (code point 13).
    59  * `"\t"`: The horizontal tab character (code point 9).
    60  * `"\v"`: The vertical tab character (code point 11).
    61  * `"\f"`: The form feed character (code point 12).
    62  * `"\xHH"`: Where each H is a hexadecimal character (0 to 9, A to Z). The hexadecimal-encoded
    63              8-bit value indicates a code point between 0 and 255.
    64  * `"\\"`: A literal backslash character.
    65  
    66  These escaped characters represent _bytes_, not Unicode code points (thus the
    67  8-bit limit). To represent literal Unicode code points above 127, a sequence of
    68  bytes representing the UTF-8 encoding of the code point will be used.
    69  
    70  A string of multiple characters indicates all characters in a sequence. In other
    71  words, the following two productions are equivalent:
    72  ```
    73  foo = "bar"
    74  foo = "b" "a" "r"
    75  ```
    76  
    77  The exclusion operator is only for use against literal characters and means that
    78  all characters _except for_ the given ones are accepted. For example `!"a"` means
    79  that any character except lower-case `a` is accepted; `!("a"|"b"|"c")` means that
    80  any character except lower-case `a`, `b`, or `c` is accepted.
    81  
    82  The form `a … b` represents the set of characters from a through b as alternatives.
    83  
    84  # Source Code Representation
    85  Source code is Unicode text encoded in UTF-8. In general, only comments and string literals
    86  can contain code points outside of the range of 7-bit ASCII.
    87  
    88  For compatibility with other tools, a source file may contain a UTF-8-encoded byte order mark
    89  (U+FEFF, encoded as `"\xEF\xBB\xBF"`), but only if it is the first Unicode code point in the
    90  source text.
    91  
    92  # Lexical Analysis
    93  
    94  Parsing a protobuf source file first undergoes lexical analysis. This is the process of
    95  converting the source file, which is a sequence of UTF-8 characters, into a sequence of
    96  _tokens_. (This process is also known as "tokenization".)
    97  
    98  Having a tokenization phase allows us to more simply describe the way inputs are transformed
    99  into grammatical elements and how things like whitespace and comments are handled without
   100  cluttering the main grammar.
   101  
   102  Tokenization is "greedy", meaning a token matches the longest possible sequence in the input.
   103  That way input like `"0.0.0"`, `"1to3"`, and `"packageio"` can never be interpreted as token
   104  sequences [`"0.0"`, `".0"`]; [`"1"`, `"to"`, `"3"`]; or [`"package"`, `"io"`]
   105  respectively; they will always be interpreted as single tokens.
   106  
   107  If a sequence of input is encountered that does not match any of the rules for acceptable
   108  tokens, then the source is invalid and has a syntax error.
   109  
   110  ## Discarded Input
   111  
   112  Whitespace is often necessary to separate adjacent tokens in the language. But aside from
   113  that purpose during tokenization, it is ignored. Extra whitespace is allowed anywhere between
   114  tokens. Block comments can also serve to separate tokens, are also allowed anywhere between
   115  tokens, and are also ignored by the grammar.
   116  
   117  Protobuf source allows for two styles of comments:
   118   1. Line comments: These begin with `//` and continue to the end of the line.
   119   2. Block comments: These begin with `/*` and continue until the first `*/`
   120      sequence is encountered. A single block comment can span multiple lines.
   121  
   122  So the productions below are used to identify whitespace and comments, but they will be
   123  discarded.
   124  
   125  If a parser implementation intends to produce descriptor protos that include source code info
   126  (which has details about the location of lexical elements in the file as well as comments)
   127  then the tokenizer should accumulate comments as it scans for tokens so they can be made
   128  available to that later step.
   129  ```
   130  whitespace = " " | "\n" | "\r" | "\t" | "\f" | "\v" .
   131  comment = line_comment | block_comment .
   132  
   133  line_comment = "/" "/" { !("\n" | "\x00") } .
   134  block_comment = "/" "*" comment_tail .
   135  comment_tail = "*" comment_tail_star | !("*" | "\x00") comment_tail .
   136  comment_tail_star = "/" | "*" comment_tail_star | !("*" | "/" | "\x00") comment_tail .
   137  ```
   138  
   139  If the `/*` sequence is found to start a block comment, but the above rule is not
   140  matched, it indicates a malformed block comment: EOF was reached before the
   141  concluding `*/` sequence was found. Such a malformed comment is a syntax
   142  error.
   143  
   144  If a comment text contains a null character (code point zero) then it is malformed
   145  and a syntax error should be reported.
   146  
   147  ## Character Classes
   148  
   149  The following categories for input characters are used through the lexical analysis
   150  productions in the following sections:
   151  ```
   152  letter        = "A" … "Z" | "a" … "z" | "_" .
   153  decimal_digit = "0" … "9" .
   154  octal_digit   = "0" … "7" .
   155  hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .
   156  
   157  byte_order_mark = "\xEF\xBB\xBF"
   158  ```
   159  
   160  The `byte_order_mark` byte sequence is the UTF-8 encoding of the byte-order mark
   161  character (U+FEFF).
   162  
   163  ## Tokens
   164  
   165  The result of lexical analysis is a stream of tokens of the following kinds:
   166   * `identifier`
   167   * 39 token types corresponding to keywords
   168   * `int_literal`
   169   * `float_literal`
   170   * `string_literal`
   171   * 16 token types corresponding to symbols, punctuation, and operators.
   172  
   173  ### Identifiers
   174  
   175  An identifier is used for named elements in the protobuf language, like names
   176  of messages, fields, and services. There are 42 keywords in the protobuf grammar
   177  that may also be used as identifiers.
   178  ```
   179  identifier = letter { letter | decimal_digit } .
   180  ```
   181  
   182  When an `identifier` is found, if it matches a keyword, its token type is changed
   183  to match the keyword, per the rules below. All of the keyword token types below
   184  are *also* considered identifiers by the grammar. For example, a production in the
   185  grammar that references `identifier` will also accept `syntax` or `map`.
   186  ```
   187  syntax   = "syntax" .      float    = "float" .       oneof      = "oneof" .
   188  import   = "import" .      double   = "double" .      map        = "map" .
   189  weak     = "weak" .        int32    = "int32" .       extensions = "extensions" .
   190  public   = "public" .      int64    = "int64" .       to         = "to" .
   191  package  = "package" .     uint32   = "uint32" .      max        = "max" .
   192  option   = "option" .      uint64   = "uint64" .      reserved   = "reserved" .
   193  inf      = "inf" .         sint32   = "sint32" .      enum       = "enum" .
   194  repeated = "repeated" .    sint64   = "sint64" .      message    = "message" .
   195  optional = "optional" .    fixed32  = "fixed32" .     extend     = "extend" .
   196  required = "required" .    fixed64  = "fixed64" .     service    = "service" .
   197  bool     = "bool" .        sfixed32 = "sfixed32" .    rpc        = "rpc" .
   198  string   = "string" .      sfixed64 = "sfixed64" .    stream     = "stream" .
   199  bytes    = "bytes" .       group      = "group" .     returns    = "returns" .
   200  ```
   201  
   202  ### Numeric Literals
   203  
   204  Handling of numeric literals is a bit special in order to avoid a situation where
   205  `"0.0.0"` or `"100to3"` is tokenized as [`"0.0"`, `".0"`] or [`"100"`, `"to"`, `"3"`]
   206  respectively. Instead of these input sequences representing a possible sequence of 2
   207  or more tokens, they are considered invalid numeric literals.
   208  
   209  So input is first scanned for the `numeric_literal` token type:
   210  ```
   211  numeric_literal = ( "." | decimal_digit ) { digit_or_point }
   212                    { ( "e" | "E" ) [ "+" | "-" ] digit_or_point { digit_or_point } }
   213  
   214  digit_or_point = "." | decimal_digit | letter
   215  ```
   216   
   217  When a `numeric_literal` token is found, it is then checked to see if it matches the `int_literal`
   218  or `float_literal` rules (see below). If it does then the scanned token is included in the
   219  result token stream with `int_literal` or `float_literal` as its token type. But if it does *not*
   220  match, it is a malformed numeric literal which is considered a syntax error.
   221  
   222  Below is the rule for `int_literal`:
   223  ```
   224  int_literal = decimal_literal | octal_literal | hex_literal .
   225  
   226  decimal_literal = "0" | ( "1" … "9" ) [ decimal_digits ] .
   227  octal_literal   = "0" octal_digits .
   228  hex_literal     = "0" ( "x" | "X" ) hex_digits .
   229  decimal_digits  = decimal_digit { decimal_digit } .
   230  octal_digits    = octal_digit { octal_digit } .
   231  hex_digits      = hex_digit { hex_digit } .
   232  ```
   233  
   234  Below is the rule for `float_literal`:
   235  ```
   236  float_literal = decimal_digits "." [ decimal_digits ] [ decimal_exponent ] |
   237                  decimal_digits decimal_exponent |
   238                  "." decimal_digits [ decimal_exponent ] .
   239  
   240  decimal_exponent  = ( "e" | "E" ) [ "+" | "-" ] decimal_digits .
   241  ```
   242  
   243  ### String Literals
   244  
   245  String values include C-style support for escape sequences. String literals are used
   246  for constant values of `bytes` fields, so they must be able to represent arbitrary
   247  binary data, in addition to normal/valid UTF-8 strings.
   248  
   249  Note that protobuf explicitly disallows a null character (code point 0) to appear in
   250  the string, but an _encoded null_ (e.g. `"\x00"`) can appear.
   251  ```
   252  string_literal = single_quoted_string_literal | double_quoted_string_literal .
   253  
   254  single_quoted_string_literal = "'" { !("\n" | "\x00" | "'" | `\`) | rune_escape_seq } "'" .
   255  double_quoted_string_literal = `"` { !("\n" | "\x00" | `"` | `\`) | rune_escape_seq } `"` .
   256  
   257  rune_escape_seq    = simple_escape_seq | hex_escape_seq | octal_escape_seq | unicode_escape_seq .
   258  simple_escape_seq  = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` | "?" ) .
   259  hex_escape_seq     = `\` "x" hex_digit hex_digit .
   260  octal_escape_seq   = `\` octal_digit [ octal_digit [ octal_digit ] ] .
   261  unicode_escape_seq = `\` "u" hex_digit hex_digit hex_digit hex_digit |
   262                       `\` "U" hex_digit hex_digit hex_digit hex_digit
   263                               hex_digit hex_digit hex_digit hex_digit .
   264  ```
   265  
   266  If a string literal contains a newline or null character then it is malformed and a
   267  syntax error should be reported.
   268  
   269  
   270  ### Punctuation and Operators
   271  
   272  The symbols below represent all other valid input characters used by the protobuf grammar.
   273  ```
   274  semicolon = ";" .     colon     = ":" .     l_paren   = "(" .     l_bracket = "[" .
   275  comma     = "," .     equals    = "=" .     r_paren   = ")" .     r_bracket = "]" .
   276  dot       = "." .     minus     = "-" .     l_brace   = "{" .     l_angle   = "<" .
   277  slash     = "/" .     plus      = "+" .     r_brace   = "}" .     r_angle   = ">" .
   278  ```
   279  
   280  # Grammar
   281  
   282  The productions below define the grammar rules for the protobuf IDL.
   283  
   284  ## Files
   285  
   286  The `File` production represents the contents of a valid protobuf source file.
   287  ```
   288  File = [ byte_order_mark ] [ SyntaxDecl ] { FileElement } .
   289  
   290  FileElement = ImportDecl |
   291                PackageDecl |
   292                OptionDecl |
   293                MessageDecl |
   294                EnumDecl |
   295                ExtensionDecl |
   296                ServiceDecl |
   297                EmptyDecl .
   298  EmptyDecl   = semicolon .
   299  ```
   300  
   301  ### Syntax
   302  
   303  Files should define a syntax. The string literal must have a value of
   304  "proto2" or "proto3". Other syntax values are not allowed. If a file
   305  contains no syntax statement then proto2 is the assumed syntax.
   306  
   307  String literals support C-style concatenation. So the sequence
   308  `"prot" "o2"` is equivalent to `"proto2"`.
   309  ```
   310  SyntaxDecl = syntax equals StringLiteral semicolon .
   311  
   312  StringLiteral = string_literal { string_literal } .
   313  ```
   314  
   315  ### Package Declaration
   316  
   317  A file can only include a single package declaration, though it
   318  can appear anywhere in the file (except before the syntax).
   319  
   320  Packages use dot-separated namespace components. A compound name
   321  like `foo.bar.baz` represents a nesting of namespaces, with `foo`
   322  being the outermost namespace, then `bar`, and finally `baz`.
   323  So all of the elements in two files, with packages `foo.bar.baz`
   324  and`foo.bar.buzz` for example, are in the `foo` and `foo.bar`
   325  namespaces.
   326  ```
   327  PackageDecl = package QualifiedIdentifier semicolon .
   328  
   329  QualifiedIdentifier = identifier { dot identifier } .
   330  ```
   331  
   332  ### Imports
   333  
   334  In order to refer to messages and enum types defined in another
   335  file, that file must be imported.
   336  
   337  A "public" import means that everything in that file is treated
   338  as if it were defined in the importing file, for purpose of
   339  transitive importers. For example, if file "a.proto" imports
   340  "b.proto" and "b.proto" has "c.proto" as a _public_ import, then
   341  the elements in "a.proto" may refer to elements defined in
   342  "c.proto", even though "a.proto" does not directly import "c.proto".
   343  ```
   344  ImportDecl = import [ weak | public ] StringLiteral semicolon .
   345  ```
   346  
   347  ## Options
   348  
   349  Many elements defined in a protobuf source file allow options
   350  which provide a way to customize behavior and also provide the
   351  ability to use custom annotations on elements (which can then be
   352  used by protoc plugins or runtime libraries).
   353  ```
   354  OptionDecl = option OptionName equals OptionValue semicolon .
   355  
   356  OptionName = ( identifier | l_paren TypeName r_paren ) [ dot OptionName ] .
   357  TypeName   = [ dot ] QualifiedIdentifier .
   358  ```
   359  
   360  Option values are literals. In addition to primitive literal values
   361  (like integers, floating point numbers, strings, and booleans), option
   362  values can also be aggregate values (message literals). This aggregate
   363  must be enclosed in braces (`{` and `}`).
   364  
   365  The syntax for the value _inside_ the braces, however, is the protobuf
   366  text format. This means nested message values therein may be enclosed in
   367  braces or may instead be enclosed in angle brackets (`<` and `>`). In
   368  message values, a single field is defined by a field name and value,
   369  separated by a colon. However, the colon is optional if the value is a
   370  composite (e.g. will be surrounded by braces or brackets).
   371  
   372  List literals may not be used directly as option values (even for
   373  repeated fields) but are allowed inside a message value, for a
   374  repeated field.
   375  ```
   376  OptionValue = ScalarValue | MessageLiteralWithBraces .
   377  
   378  ScalarValue        = StringLiteral | UnsignedNumLiteral | SignedNumLiteral | identifier .
   379  UnsignedNumLiteral = float_literal | int_literal .
   380  SignedNumLiteral   = ( minus | plus ) ( float_literal | int_literal | inf ) .
   381  
   382  MessageLiteralWithBraces = l_brace { MessageLiteralField } r_brace .
   383  MessageLiteralField      = MessageLiteralFieldName colon Value |
   384                             MessageLiteralFieldName CompositeValue .
   385  MessageLiteralFieldName  = identifier |
   386                             l_bracket [ QualifiedIdentifier slash ] QualifiedIdentifier r_bracket .
   387  Value                    = ScalarValue | CompositeValue .
   388  CompositeValue           = MessageLiteral | ListLiteral .
   389  MessageLiteral           = MessageLiteralWithBraces |
   390                             l_angle { MessageLiteralField } r_angle .
   391  
   392  ListLiteral = l_bracket [ ListElement { comma ListElement } ] r_bracket .
   393  ListElement = ScalarValue | MessageLiteral .
   394  ```
   395  
   396  ## Messages
   397  
   398  The core of the protobuf IDL is defining messages, which are heterogenous
   399  composite data types.
   400  
   401  Files whose syntax declaration indicates "proto3" are not allowed to
   402  include `GroupDecl` or `ExtensionRangeDecl` elements.
   403  ```
   404  MessageDecl = message identifier l_brace { MessageElement } r_brace .
   405  
   406  MessageElement = FieldDecl |
   407                   MapFieldDecl |
   408                   GroupDecl |
   409                   OneofDecl |
   410                   OptionDecl |
   411                   ExtensionRangeDecl |
   412                   MessageReservedDecl |
   413                   EnumDecl |
   414                   MessageDecl |
   415                   ExtensionDecl |
   416                   EmptyDecl .
   417  ```
   418  
   419  ### Fields
   420  
   421  Field declarations are found inside messages. They can also be found inside
   422  `extends` blocks, for defining extension fields. Each field indicates its
   423  cardinality (`required`, `optional`, or `repeated`; also called the field's label),
   424  its type, its name, its tag number, and (optionally) options.
   425  
   426  Field declarations in the proto2 syntax *require* a label token.
   427  
   428  Declarations in proto3 are not allowed to use `required` labels and may omit
   429  the `optional` label. When the label is omitted, the subsequent type name
   430  may *not* start with an identifier whose text could be ambiguous with other
   431  kinds of elements in this scope. So such field declarations in a message declaration
   432  may not have a type name that starts with any of the following identifiers:
   433     * "message"
   434     * "enum"
   435     * "oneof"
   436     * "extensions"
   437     * "reserved"
   438     * "extend"
   439     * "option"
   440     * "optional"
   441     * "required"
   442     * "repeated"
   443  
   444  Similarly, a field declaration in an `extends` block may not have a type name
   445  that starts with any of the following identifiers:
   446     * "optional"
   447     * "required"
   448     * "repeated"
   449  
   450  Note that it is acceptable if the above words are _prefixes_ of the first token in
   451  the type name. For example, inside a message a type name "enumeration" is allowed, even
   452  though it starts with "enum". But a name of "enum.Statuses" would not be allowed, because
   453  the first constituent token is "enum". A _fully_qualified_ type name (one that starts with
   454  a dot) is always accepted, regardless of the first identifier token, since the dot prevents
   455  ambiguity.
   456  ```
   457  FieldDecl = [ required | optional | repeated ] TypeName identifier equals int_literal
   458              [ CompactOptions ] semicolon .
   459  
   460  CompactOptions = l_bracket CompactOption { comma CompactOption } r_bracket .
   461  CompactOption  = OptionName equals OptionValue .
   462  ```
   463  
   464  Map fields never have a label as their cardinality is implicitly repeated (since
   465  a map can have more than one entry).
   466  ```
   467  MapFieldDecl = MapType identifier equals int_literal [ CompactOptions ] semicolon .
   468  
   469  MapType    = map l_angle MapKeyType comma TypeName r_angle .
   470  MapKeyType = int32   | int64   | uint32   | uint64   | sint32 | sint64 |
   471               fixed32 | fixed64 | sfixed32 | sfixed64 | bool   | string .
   472  ```
   473  
   474  Groups are a mechanism in proto2 to create a field that is a nested message.
   475  The message definition is inlined into the group field declaration.
   476  
   477  The group's name must start with a capital letter. In some contexts, the group field
   478  goes by the lower-cased form of this name.
   479  ```
   480  GroupDecl = [ required | optional | repeated ] group identifier equals int_literal
   481              [ CompactOptions ] l_brace { MessageElement } r_brace .
   482  ```
   483  
   484  ### Oneofs
   485  
   486  A "oneof" is a set of fields that act like a discriminated union.
   487  
   488  Files whose syntax declaration indicates "proto3" are not allowed to
   489  include `OneofGroupDecl` elements.
   490  ```
   491  OneofDecl = oneof identifier l_brace { OneofElement } r_brace .
   492  
   493  OneofElement = OptionDecl |
   494                 OneofFieldDecl |
   495                 OneofGroupDecl |
   496                 EmptyDecl .
   497  ```
   498  
   499  Fields in a oneof always omit the label (`required`, `optional`, or `repeated`) and
   500  are always optional. They follow the same restrictions as other field declarations
   501  that have no leading label: the first token of the `TypeName` may not be an
   502  `identifier` whose text could be ambiguous with other elements. They also may not
   503  match any of the label keywords. To that end, fields in a oneof may not have a type
   504  name that starts with any of the following:
   505    * "option"
   506    * "optional"
   507    * "required"
   508    * "repeated"
   509  ```
   510  OneofFieldDecl = TypeName identifier equals int_literal
   511                   [ CompactOptions ] semicolon .
   512  ```
   513  
   514  A group's name must start with a capital letter. In some contexts, the group field
   515  goes by the lower-cased form of this name.
   516  ```
   517  OneofGroupDecl = group identifier equals int_literal
   518                   [ CompactOptions ] l_brace { MessageElement } r_brace .
   519  ```
   520  
   521  ### Extension Ranges
   522  
   523  Extendable messages (proto2 syntax only) may define ranges of tags. Extension fields
   524  must use a tag in one of these ranges.
   525  ```
   526  ExtensionRangeDecl = extensions TagRange { comma TagRange } [ CompactOptions ] semicolon .
   527  
   528  TagRange = int_literal [ to ( int_literal | max ) ] .
   529  ```
   530  
   531  ### Reserved Names and Numbers
   532  
   533  Messages can reserve field names and numbers to prevent them from being used.
   534  This is typically to prevent old tag numbers and names from being recycled.
   535  ```
   536  MessageReservedDecl = reserved ( TagRange { comma TagRange } | Names ) semicolon .
   537  
   538  Names = String { comma String }
   539  ```
   540  
   541  ## Enums
   542  
   543  Enums represent an enumerated type, where values must be one of the defined
   544  enum values.
   545  ```
   546  EnumDecl = enum identifier l_brace { EnumElement } r_brace .
   547  
   548  EnumElement = OptionDecl |
   549                EnumValueDecl |
   550                EnumReservedDecl |
   551                EmptyDecl .
   552  ```
   553  
   554  Value names (the first `identifier` token) may not match any of these keywords:
   555    * "reserved"
   556    * "option"
   557  ```
   558  EnumValueDecl = identifier equals SignedIntLiteral [ CompactOptions ] semicolon .
   559  
   560  SignedIntLiteral = [ minus ] int_literal .
   561  ```
   562  
   563  Like messages, enums can also reserve names and numbers, typically to prevent
   564  recycling names and numbers from old enum values.
   565  ```
   566  EnumReservedDecl = reserved ( EnumValueRange { comma EnumValueRange } | Names ) semicolon .
   567  
   568  EnumValueRange = SignedIntLiteral [ to ( SignedIntLiteral | max ) ] .
   569  ```
   570  
   571  ## Extensions
   572  
   573  Extensions are allowed in both proto2 and proto3, even though an _extendable
   574  message_ can only be defined in a file with proto2 syntax.
   575  
   576  However, a file with proto3 syntax is not allowed to use the `GroupDecl` rule
   577  as groups are not supported in proto3.
   578  ```
   579  ExtensionDecl = extend TypeName l_brace { ExtensionElement } r_brace .
   580  
   581  ExtensionElement = FieldDecl |
   582                     GroupDecl |
   583                     EmptyDecl .
   584  ```
   585  
   586  ## Services
   587  
   588  Services are used to define RPC interfaces. Each service is a collection
   589  of RPC methods.
   590  ```
   591  ServiceDecl = service identifier l_brace { ServiceElement } r_brace .
   592  
   593  ServiceElement = OptionDecl |
   594                   RpcDecl |
   595                   EmptyDecl .
   596  ```
   597  
   598  Each RPC defines a single method/operation and its request and response types.
   599  ```
   600  RpcDecl = rpc identifier RpcType returns RpcType semicolon |
   601            rpc identifier RpcType returns RpcType l_brace { RpcElement } r_brace .
   602  
   603  RpcType    = l_paren [ stream ] TypeName r_paren .
   604  RpcElement = OptionDecl |
   605               EmptyDecl .
   606  ```