github.com/mistwind/reviewdog@v0.0.0-20230322024206-9cfa11856d58/proto/rdf/README.md (about)

     1  ---
     2  title: Reviewdog Diagnostic Format
     3  date: 2020-06-15
     4  author: haya14busa
     5  status: Proposed / Experimental
     6  ---
     7  
     8  # Status
     9  
    10  This document proposes Reviewdog Diagnostic Format and it's still
    11  in experimental stage.
    12  
    13  Any review, suggestion, feedback, criticism, and comments from anyone is very
    14  much welcome. Please leave comments in Pull Request ([#629](https://github.com/mistwind/reviewdog/pull/629)),
    15  in issue [#628](https://github.com/mistwind/reviewdog/issues/628) or
    16  file [an issue](https://github.com/mistwind/reviewdog/issues).
    17  
    18  The document and the actual definition are currently under the
    19  https://github.com/mistwind/reviewdog repository, but we may create a separate
    20  repository once it's reviewed and stabilized.
    21  
    22  # Reviewdog Diagnostic Format (RDFormat)
    23  
    24  Reviewdog Diagnostic Format defines standard machine-readable message
    25  structures which represent a result of diagnostic tool such as a compiler or a
    26  linter.
    27  
    28  The idea behind the Reviewdog Diagnostic Format is to standardize
    29  the protocol for how diagnostic tools (e.g. compilers, linters, etc..) and
    30  development tools (e.g. editors, reviewdog, code review API etc..) communicate.
    31  
    32  See [reviewdog.proto](reviewdog.proto) for the actual definition.
    33  [JSON Schema](./jsonschema) is available as well.
    34  
    35  ## Wire formats of Reviewdog Diagnostic Format.
    36  
    37  RDFormat uses [Protocol Buffer](https://developers.google.com/protocol-buffers) to
    38  define the message structure, but the recommended wire format is JSON considering
    39  it's widely used and easy to support both from diagnostic tools and development
    40  tools.
    41  
    42  ### **rdjsonl**
    43  JSON Lines (http://jsonlines.org/) of the [`Diagnostic`](reviewdog.proto) message ([JSON Schema](./jsonschema/Diagnostic.jsonschema)).
    44  
    45  Example:
    46  ```json
    47  {"message": "<msg>", "location": {"path": "<file path>", "range": {"start": {"line": 14, "column": 15}}}, "severity": "ERROR"}
    48  {"message": "<msg>", "location": {"path": "<file path>", "range": {"start": {"line": 14, "column": 15}, "end": {"line": 14, "column": 18}}}, "suggestions": [{"range": {"start": {"line": 14, "column": 15}, "end": {"line": 14, "column": 18}}, "text": "<replacement text>"}], "severity": "WARNING"}
    49  ...
    50  ```
    51  
    52  ### **rdjson**
    53  JSON format of the [`DiagnosticResult`](reviewdog.proto) message ([JSON Schema](./jsonschema/DiagnosticResult.jsonschema)).
    54  
    55  Example:
    56  ```json
    57  {
    58    "source": {
    59      "name": "super lint",
    60      "url": "https://example.com/url/to/super-lint"
    61    },
    62    "severity": "WARNING",
    63    "diagnostics": [
    64      {
    65        "message": "<msg>",
    66        "location": {
    67          "path": "<file path>",
    68          "range": {
    69            "start": {
    70              "line": 14,
    71              "column": 15
    72            }
    73          }
    74        },
    75        "severity": "ERROR",
    76        "code": {
    77          "value": "RULE1",
    78          "url": "https://example.com/url/to/super-lint/RULE1"
    79        }
    80      },
    81      {
    82        "message": "<msg>",
    83        "location": {
    84          "path": "<file path>",
    85          "range": {
    86            "start": {
    87              "line": 14,
    88              "column": 15
    89            },
    90            "end": {
    91              "line": 14,
    92              "column": 18
    93            }
    94          }
    95        },
    96        "suggestions": [
    97          {
    98            "range": {
    99              "start": {
   100                "line": 14,
   101                "column": 15
   102              },
   103              "end": {
   104                "line": 14,
   105                "column": 18
   106              }
   107            },
   108            "text": "<replacement text>"
   109          }
   110        ],
   111        "severity": "WARNING"
   112      }
   113    ]
   114  }
   115  ```
   116  
   117  ## Background: Still No Good Standard Diagnostic Format Out There in 2020
   118  
   119  Update: Found *The Static Analysis Results Interchange Format (SARIF)* as a
   120  potential good standard format.
   121  
   122  As of writing (2020), most diagnostic tools such as linters or compilers output
   123  results with their own format. Some tools support machine-readable structured
   124  format like their own JSON format, and other tools just support unstructured
   125  format (e.g. `/path/to/file:<line>:<column>: <message>`).
   126  
   127  The fact that there are no standard formats for diagnostic tools' output makes
   128  it hard to integrate diagnostic tools with development tools such as editors or
   129  automated code review tools/services.
   130  
   131  [reviewdog](https://github.com/mistwind/reviewdog) resolves the above problem
   132  by introducing [errorformat](https://github.com/reviewdog/errorformat) to
   133  support unstructured output and checkstyle XML format as structured output.
   134  It works great so far and reviewdog can support arbitrary diagnostic tools
   135  regardless of programming languages. However, these solutions doesn't solve
   136  everything.
   137  
   138  ### *errorformat*
   139  [errorformat](https://github.com/reviewdog/errorformat)
   140  
   141  Problems:
   142  - No support for diagnostics for code range. It only supports start position.
   143  - No support for code suggestions (also known as auto-correct or fix).
   144  - It's hard to write errorformat for complicated output.
   145  
   146  ### *checkstyle XML format*
   147  [checkstyle](https://checkstyle.sourceforge.io/)
   148  
   149  Problems:
   150  - No support for diagnostics for code range. It only supports start position.
   151  - No support for code suggestions (also known as auto-correct or fix).
   152  - It's ..... XML. It's true that some diagnostic tools support checkstyle
   153  format, but not everyone wants to support it.
   154  - The checkstyle itself is actually a diagnostic tool for Java and its
   155    output format is actually not well-documented and not meant to be
   156    used as generic format. Some linters just happens to use the same format(?).
   157  
   158  ## Background: Alternatives
   159  
   160  There are alternative solutions out there (which are not used by reviewdog) as
   161  well.
   162  
   163  ### The Static Analysis Results Interchange Format (SARIF)
   164  [The Static Analysis Results Interchange Format (SARIF)](https://sarifweb.azurewebsites.net/)
   165  has been approved as an OASIS standard.
   166  
   167  Although, there are not many usages of SARIF as of writing (2020 July, 21),
   168  it can be good standard format.
   169  A promising usage example is [GitHub Code Scanning](https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/about-code-scanning#about-third-party-code-scanning-tools)
   170  (beta), which uses SARIF to support third party code scanning tools.
   171  Other examples: [spotbugs](https://github.com/spotbugs/discuss/issues/95).
   172  
   173  Problems:
   174  - No stream output support and static analysis tools cannot output each diagnostic result one by one.
   175  - `columnKind` doesn't support byte count. https://github.com/oasis-tcs/sarif-spec/issues/466
   176  - The spec is too big and complex ([SARIF v2.1.0 PDF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.pdf) is 227 pages!)
   177    for developer tools as consumer of SARIF (e.g.  reviewdog). Probably most
   178    tools end up with supporting SARIF partially.  GitHub Code Scanning feature
   179    actually doesn't support a whole spec
   180    ([doc](https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/sarif-support-for-code-scanning))
   181    for example.
   182  - The spec is too big and complex for static analysis tools as provider of
   183    SARIF. They can just support partial and minimum SARIF support as result
   184    output format but it's still not simple and the output still needs to pass
   185    SARIF validatiotor.
   186  - Not all languages have good tools to generate code from JSON Schema.
   187    To create Go SARIF package [haya14busa/go-sarif](https://github.com/haya14busa/go-sarif),
   188    I needed to try 3+ [Go](https://github.com/atombender/go-jsonschema) [JSON Schema](https://github.com/idubinskiy/schematyper)
   189    [Code Generator](https://github.com/aaharu/schemarshal)
   190    tools but all of them didn't work for the complex SARIF JSON Schema.
   191    I ended up using [quicktype](https://github.com/quicktype/quicktype) and it
   192    worked but I still needed to send [a Pull Request](https://github.com/quicktype/quicktype/pull/1513)...
   193  - SARIF SDK and related tools are written in C# (and TypeScript), which means we need dotnet runtime.
   194    SARIF is general and standard format while the related tools requires dotnet runtime.
   195  
   196  There are some problems as above but SARIF should be still good to support
   197  considering it has been already approved as an OASIS standard and GitHub Code
   198  Scanning uses it.
   199  Reviewdog Diagnostic Format can be used as simpler format and we can create
   200  converters between RD Format and SARIF.
   201  
   202  ### *Problem Matcher*
   203  [VSCode](https://vscode-docs.readthedocs.io/en/stable/editor/tasks/#defining-a-problem-matcher)
   204  and [GitHub Actions](https://github.com/actions/toolkit/blob/master/docs/problem-matchers.md)
   205  uses [Problem Matcher](https://github.com/actions/toolkit/blob/master/docs/problem-matchers.md)
   206  to support arbitrary diagnostic tools. It's similar to errorformat, but it uses regex.
   207  
   208  Problems:
   209  - No support for code suggestions (also known as auto-correct or fix).
   210  - Output format of matched results are undocumented and it seems to be used internally in VSCode and GitHub Actions.
   211  - It's hard to write problem matchers for complicated output.
   212  
   213  ### *Language Server Protocol (LSP)*
   214  [Language Server Protocol Specification](https://microsoft.github.io/language-server-protocol/specifications/specification-current/)
   215  
   216  LSP supports [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic)
   217  to represents a diagnostic, such as a compiler error or warning.
   218  It's great for editor integration and is widely used these days as well.
   219  RDFormat message is actually inspired by LSP Diagnostic message too.
   220  
   221  Problems:
   222  - LSP and the Diagnostic message is basically per one file. It's not always
   223    suited to be used as diagnostic tools output because they often need to
   224    report diagnostic results for multiple files and outputting json per file does
   225    not make very much sense.
   226  - LSP's Diagnostic message doesn't have code suggestions (code action) data.
   227    Code action have data about associated diagnostic on the contrary and the
   228    code action message itself doesn't contain text edit data too, so LSP's
   229    messages are not suited to represent a diagnosis result with suggested fix.
   230  - Unnatural position representation: Position in LSP are zero-based and
   231    character offset is based on [UTF-16 code units](https://github.com/microsoft/language-server-protocol/issues/376).
   232    These are not widely used by diagnostic tools, development tools nor code
   233    review API such as GitHub, GitLab and Gerrit....
   234    In addition, UTF-8 is defact-standard of text file encoding as well these days.
   235  
   236  ## Reviewdog Diagnostic Format Concept
   237  Again, the idea behind the Reviewdog Diagnostic Format (RDFormat) is to
   238  standardize the protocol for how diagnostic tools (e.g. compilers, linters,
   239  etc..) and development tools (e.g. editors, reviewdog, code review API etc..)
   240  communicate.
   241  
   242  RDFormat should support major use cases from representing diagnostic results to
   243  apply suggested fix in general way and should be easily supported by diagnostic
   244  tools and development tools regardless of their programming languages.
   245  
   246  [![Reviewdog Diagnostic Format Concept](https://user-images.githubusercontent.com/3797062/87955046-2b8b6300-cae8-11ea-983f-6554e2aeb8f2.png)](https://docs.google.com/drawings/d/15GZu5Iq6wukFtrpy91srQO_ry1iFQUisVAJd_yEprLc/edit?usp=sharing)
   247  
   248  ### Diagnostic tools' RDFormat Support
   249  Ideally, diagnostic tools themselves should support outputting their results as
   250  RDFormat compliant format, but not all tools does support RDFormat especially
   251  in early stage. But we can still introduce RDFormat by supporting RDFormat with
   252  errorformat for most diagnostic tools. Also, we can write a converter and add
   253  RPD support in diagnostic tools incrementally.
   254  
   255  ### Consumer: reviewdog
   256  *Not implemented yet*
   257  
   258  reviewdog can support RDFormat and consume `rdjsonl`/`rdjson` as structured input
   259  of diagnostic tools.
   260  It also makes it possible to support (1) a diagnostic to code range and (2)
   261  code suggestions (auto-correction) if a reporter supports them (e.g.
   262  github-pr-review, gitlab-mr-discussion and local reporter).
   263  
   264  As for suggestion support with local reporter, reviewdog should be able to
   265  apply suggestions only in diff for example.
   266  
   267  ### Consumer: Editor & Language Server Protocol
   268  *Not implemented yet*
   269  
   270  It's going to be easier for editors to support arbitrary diagnostic tools by
   271  using RDFormat. Language Server can also use RDFormat and it's easy to convert RDFormat
   272  message to LSP Diagnostic and/or Code Action message.
   273  
   274  One possible more concrete idea is to extend
   275  [efm-langserver](https://github.com/mattn/efm-langserver) to support RDFormat
   276  message as input.
   277  efm-langserver currently uses
   278  [errorformat](https://github.com/reviewdog/errorformat) to support diagnostic
   279  tools generally, but not all tools' output can be easily parsed with
   280  errorformat and errorformat lacks some features like diagnostics for code range.
   281  It should be able to support code action to apply suggested fix as well.
   282  
   283  ### Consumer: Reviewdog Diagnostic Formatter (RDFormatter)
   284  *Not implemented yet*
   285  
   286  There are many diagnostic output formats (report formats) and each diagnostic
   287  tool implements them on their own. e.g. [eslint](https://eslint.org/docs/user-guide/formatters)
   288  support more than 10 formats like stylish, compact, codeframe, html, etc...
   289  Users may want to use a certain format for every diagnostic tools they use, but 
   290  not all tools support their desired format. It takes time to implement many
   291  formats for each tool and it's actually not worth doing it for most of the
   292  cases, IMO.
   293  
   294  Reviewdog Diagnostic Formatter should support formatting of diagnostic
   295  results based on RDfFormat. Then, diagnostic tools can focus on improving
   296  diagnostic feature and let the formatter to format the results.
   297  
   298  RDFormatter should be provided both as CLI and as libraries.
   299  The CLI can take RDFormat messages as input and output formatted results. The CLI
   300  should be especially useful to build special format like custom html to
   301  generate report pages independing on diagnostic tools nor their implementation
   302  languages. However, many diagnostic tools and users should not always want to
   303  depend on the CLI, so providing libraries for their implementation languages
   304  should be useful to format results natively by each diagnostic tool.
   305  
   306  ## Open Questions
   307  - Protocol Version Representation and Backward/Future Compatibility
   308    - Should we add version or some capability data in RD Format?
   309    - RD Format should be stable, but there are still a possibility to extend it with
   310      backward incompatible way. e.g. We **may** want to add byte offset field in
   311      Position message as an alternative of line and column.