github.com/mistwind/reviewdog@v0.0.0-20230322024206-9cfa11856d58/proto/rdf/README.md (about) 1 --- 2 title: Reviewdog Diagnostic Format 3 date: 2020-06-15 4 author: haya14busa 5 status: Proposed / Experimental 6 --- 7 8 # Status 9 10 This document proposes Reviewdog Diagnostic Format and it's still 11 in experimental stage. 12 13 Any review, suggestion, feedback, criticism, and comments from anyone is very 14 much welcome. Please leave comments in Pull Request ([#629](https://github.com/mistwind/reviewdog/pull/629)), 15 in issue [#628](https://github.com/mistwind/reviewdog/issues/628) or 16 file [an issue](https://github.com/mistwind/reviewdog/issues). 17 18 The document and the actual definition are currently under the 19 https://github.com/mistwind/reviewdog repository, but we may create a separate 20 repository once it's reviewed and stabilized. 21 22 # Reviewdog Diagnostic Format (RDFormat) 23 24 Reviewdog Diagnostic Format defines standard machine-readable message 25 structures which represent a result of diagnostic tool such as a compiler or a 26 linter. 27 28 The idea behind the Reviewdog Diagnostic Format is to standardize 29 the protocol for how diagnostic tools (e.g. compilers, linters, etc..) and 30 development tools (e.g. editors, reviewdog, code review API etc..) communicate. 31 32 See [reviewdog.proto](reviewdog.proto) for the actual definition. 33 [JSON Schema](./jsonschema) is available as well. 34 35 ## Wire formats of Reviewdog Diagnostic Format. 36 37 RDFormat uses [Protocol Buffer](https://developers.google.com/protocol-buffers) to 38 define the message structure, but the recommended wire format is JSON considering 39 it's widely used and easy to support both from diagnostic tools and development 40 tools. 41 42 ### **rdjsonl** 43 JSON Lines (http://jsonlines.org/) of the [`Diagnostic`](reviewdog.proto) message ([JSON Schema](./jsonschema/Diagnostic.jsonschema)). 44 45 Example: 46 ```json 47 {"message": "<msg>", "location": {"path": "<file path>", "range": {"start": {"line": 14, "column": 15}}}, "severity": "ERROR"} 48 {"message": "<msg>", "location": {"path": "<file path>", "range": {"start": {"line": 14, "column": 15}, "end": {"line": 14, "column": 18}}}, "suggestions": [{"range": {"start": {"line": 14, "column": 15}, "end": {"line": 14, "column": 18}}, "text": "<replacement text>"}], "severity": "WARNING"} 49 ... 50 ``` 51 52 ### **rdjson** 53 JSON format of the [`DiagnosticResult`](reviewdog.proto) message ([JSON Schema](./jsonschema/DiagnosticResult.jsonschema)). 54 55 Example: 56 ```json 57 { 58 "source": { 59 "name": "super lint", 60 "url": "https://example.com/url/to/super-lint" 61 }, 62 "severity": "WARNING", 63 "diagnostics": [ 64 { 65 "message": "<msg>", 66 "location": { 67 "path": "<file path>", 68 "range": { 69 "start": { 70 "line": 14, 71 "column": 15 72 } 73 } 74 }, 75 "severity": "ERROR", 76 "code": { 77 "value": "RULE1", 78 "url": "https://example.com/url/to/super-lint/RULE1" 79 } 80 }, 81 { 82 "message": "<msg>", 83 "location": { 84 "path": "<file path>", 85 "range": { 86 "start": { 87 "line": 14, 88 "column": 15 89 }, 90 "end": { 91 "line": 14, 92 "column": 18 93 } 94 } 95 }, 96 "suggestions": [ 97 { 98 "range": { 99 "start": { 100 "line": 14, 101 "column": 15 102 }, 103 "end": { 104 "line": 14, 105 "column": 18 106 } 107 }, 108 "text": "<replacement text>" 109 } 110 ], 111 "severity": "WARNING" 112 } 113 ] 114 } 115 ``` 116 117 ## Background: Still No Good Standard Diagnostic Format Out There in 2020 118 119 Update: Found *The Static Analysis Results Interchange Format (SARIF)* as a 120 potential good standard format. 121 122 As of writing (2020), most diagnostic tools such as linters or compilers output 123 results with their own format. Some tools support machine-readable structured 124 format like their own JSON format, and other tools just support unstructured 125 format (e.g. `/path/to/file:<line>:<column>: <message>`). 126 127 The fact that there are no standard formats for diagnostic tools' output makes 128 it hard to integrate diagnostic tools with development tools such as editors or 129 automated code review tools/services. 130 131 [reviewdog](https://github.com/mistwind/reviewdog) resolves the above problem 132 by introducing [errorformat](https://github.com/reviewdog/errorformat) to 133 support unstructured output and checkstyle XML format as structured output. 134 It works great so far and reviewdog can support arbitrary diagnostic tools 135 regardless of programming languages. However, these solutions doesn't solve 136 everything. 137 138 ### *errorformat* 139 [errorformat](https://github.com/reviewdog/errorformat) 140 141 Problems: 142 - No support for diagnostics for code range. It only supports start position. 143 - No support for code suggestions (also known as auto-correct or fix). 144 - It's hard to write errorformat for complicated output. 145 146 ### *checkstyle XML format* 147 [checkstyle](https://checkstyle.sourceforge.io/) 148 149 Problems: 150 - No support for diagnostics for code range. It only supports start position. 151 - No support for code suggestions (also known as auto-correct or fix). 152 - It's ..... XML. It's true that some diagnostic tools support checkstyle 153 format, but not everyone wants to support it. 154 - The checkstyle itself is actually a diagnostic tool for Java and its 155 output format is actually not well-documented and not meant to be 156 used as generic format. Some linters just happens to use the same format(?). 157 158 ## Background: Alternatives 159 160 There are alternative solutions out there (which are not used by reviewdog) as 161 well. 162 163 ### The Static Analysis Results Interchange Format (SARIF) 164 [The Static Analysis Results Interchange Format (SARIF)](https://sarifweb.azurewebsites.net/) 165 has been approved as an OASIS standard. 166 167 Although, there are not many usages of SARIF as of writing (2020 July, 21), 168 it can be good standard format. 169 A promising usage example is [GitHub Code Scanning](https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/about-code-scanning#about-third-party-code-scanning-tools) 170 (beta), which uses SARIF to support third party code scanning tools. 171 Other examples: [spotbugs](https://github.com/spotbugs/discuss/issues/95). 172 173 Problems: 174 - No stream output support and static analysis tools cannot output each diagnostic result one by one. 175 - `columnKind` doesn't support byte count. https://github.com/oasis-tcs/sarif-spec/issues/466 176 - The spec is too big and complex ([SARIF v2.1.0 PDF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.pdf) is 227 pages!) 177 for developer tools as consumer of SARIF (e.g. reviewdog). Probably most 178 tools end up with supporting SARIF partially. GitHub Code Scanning feature 179 actually doesn't support a whole spec 180 ([doc](https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/sarif-support-for-code-scanning)) 181 for example. 182 - The spec is too big and complex for static analysis tools as provider of 183 SARIF. They can just support partial and minimum SARIF support as result 184 output format but it's still not simple and the output still needs to pass 185 SARIF validatiotor. 186 - Not all languages have good tools to generate code from JSON Schema. 187 To create Go SARIF package [haya14busa/go-sarif](https://github.com/haya14busa/go-sarif), 188 I needed to try 3+ [Go](https://github.com/atombender/go-jsonschema) [JSON Schema](https://github.com/idubinskiy/schematyper) 189 [Code Generator](https://github.com/aaharu/schemarshal) 190 tools but all of them didn't work for the complex SARIF JSON Schema. 191 I ended up using [quicktype](https://github.com/quicktype/quicktype) and it 192 worked but I still needed to send [a Pull Request](https://github.com/quicktype/quicktype/pull/1513)... 193 - SARIF SDK and related tools are written in C# (and TypeScript), which means we need dotnet runtime. 194 SARIF is general and standard format while the related tools requires dotnet runtime. 195 196 There are some problems as above but SARIF should be still good to support 197 considering it has been already approved as an OASIS standard and GitHub Code 198 Scanning uses it. 199 Reviewdog Diagnostic Format can be used as simpler format and we can create 200 converters between RD Format and SARIF. 201 202 ### *Problem Matcher* 203 [VSCode](https://vscode-docs.readthedocs.io/en/stable/editor/tasks/#defining-a-problem-matcher) 204 and [GitHub Actions](https://github.com/actions/toolkit/blob/master/docs/problem-matchers.md) 205 uses [Problem Matcher](https://github.com/actions/toolkit/blob/master/docs/problem-matchers.md) 206 to support arbitrary diagnostic tools. It's similar to errorformat, but it uses regex. 207 208 Problems: 209 - No support for code suggestions (also known as auto-correct or fix). 210 - Output format of matched results are undocumented and it seems to be used internally in VSCode and GitHub Actions. 211 - It's hard to write problem matchers for complicated output. 212 213 ### *Language Server Protocol (LSP)* 214 [Language Server Protocol Specification](https://microsoft.github.io/language-server-protocol/specifications/specification-current/) 215 216 LSP supports [Diagnostic](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#diagnostic) 217 to represents a diagnostic, such as a compiler error or warning. 218 It's great for editor integration and is widely used these days as well. 219 RDFormat message is actually inspired by LSP Diagnostic message too. 220 221 Problems: 222 - LSP and the Diagnostic message is basically per one file. It's not always 223 suited to be used as diagnostic tools output because they often need to 224 report diagnostic results for multiple files and outputting json per file does 225 not make very much sense. 226 - LSP's Diagnostic message doesn't have code suggestions (code action) data. 227 Code action have data about associated diagnostic on the contrary and the 228 code action message itself doesn't contain text edit data too, so LSP's 229 messages are not suited to represent a diagnosis result with suggested fix. 230 - Unnatural position representation: Position in LSP are zero-based and 231 character offset is based on [UTF-16 code units](https://github.com/microsoft/language-server-protocol/issues/376). 232 These are not widely used by diagnostic tools, development tools nor code 233 review API such as GitHub, GitLab and Gerrit.... 234 In addition, UTF-8 is defact-standard of text file encoding as well these days. 235 236 ## Reviewdog Diagnostic Format Concept 237 Again, the idea behind the Reviewdog Diagnostic Format (RDFormat) is to 238 standardize the protocol for how diagnostic tools (e.g. compilers, linters, 239 etc..) and development tools (e.g. editors, reviewdog, code review API etc..) 240 communicate. 241 242 RDFormat should support major use cases from representing diagnostic results to 243 apply suggested fix in general way and should be easily supported by diagnostic 244 tools and development tools regardless of their programming languages. 245 246 [![Reviewdog Diagnostic Format Concept](https://user-images.githubusercontent.com/3797062/87955046-2b8b6300-cae8-11ea-983f-6554e2aeb8f2.png)](https://docs.google.com/drawings/d/15GZu5Iq6wukFtrpy91srQO_ry1iFQUisVAJd_yEprLc/edit?usp=sharing) 247 248 ### Diagnostic tools' RDFormat Support 249 Ideally, diagnostic tools themselves should support outputting their results as 250 RDFormat compliant format, but not all tools does support RDFormat especially 251 in early stage. But we can still introduce RDFormat by supporting RDFormat with 252 errorformat for most diagnostic tools. Also, we can write a converter and add 253 RPD support in diagnostic tools incrementally. 254 255 ### Consumer: reviewdog 256 *Not implemented yet* 257 258 reviewdog can support RDFormat and consume `rdjsonl`/`rdjson` as structured input 259 of diagnostic tools. 260 It also makes it possible to support (1) a diagnostic to code range and (2) 261 code suggestions (auto-correction) if a reporter supports them (e.g. 262 github-pr-review, gitlab-mr-discussion and local reporter). 263 264 As for suggestion support with local reporter, reviewdog should be able to 265 apply suggestions only in diff for example. 266 267 ### Consumer: Editor & Language Server Protocol 268 *Not implemented yet* 269 270 It's going to be easier for editors to support arbitrary diagnostic tools by 271 using RDFormat. Language Server can also use RDFormat and it's easy to convert RDFormat 272 message to LSP Diagnostic and/or Code Action message. 273 274 One possible more concrete idea is to extend 275 [efm-langserver](https://github.com/mattn/efm-langserver) to support RDFormat 276 message as input. 277 efm-langserver currently uses 278 [errorformat](https://github.com/reviewdog/errorformat) to support diagnostic 279 tools generally, but not all tools' output can be easily parsed with 280 errorformat and errorformat lacks some features like diagnostics for code range. 281 It should be able to support code action to apply suggested fix as well. 282 283 ### Consumer: Reviewdog Diagnostic Formatter (RDFormatter) 284 *Not implemented yet* 285 286 There are many diagnostic output formats (report formats) and each diagnostic 287 tool implements them on their own. e.g. [eslint](https://eslint.org/docs/user-guide/formatters) 288 support more than 10 formats like stylish, compact, codeframe, html, etc... 289 Users may want to use a certain format for every diagnostic tools they use, but 290 not all tools support their desired format. It takes time to implement many 291 formats for each tool and it's actually not worth doing it for most of the 292 cases, IMO. 293 294 Reviewdog Diagnostic Formatter should support formatting of diagnostic 295 results based on RDfFormat. Then, diagnostic tools can focus on improving 296 diagnostic feature and let the formatter to format the results. 297 298 RDFormatter should be provided both as CLI and as libraries. 299 The CLI can take RDFormat messages as input and output formatted results. The CLI 300 should be especially useful to build special format like custom html to 301 generate report pages independing on diagnostic tools nor their implementation 302 languages. However, many diagnostic tools and users should not always want to 303 depend on the CLI, so providing libraries for their implementation languages 304 should be useful to format results natively by each diagnostic tool. 305 306 ## Open Questions 307 - Protocol Version Representation and Backward/Future Compatibility 308 - Should we add version or some capability data in RD Format? 309 - RD Format should be stable, but there are still a possibility to extend it with 310 backward incompatible way. e.g. We **may** want to add byte offset field in 311 Position message as an alternative of line and column.