github.com/nextlinux/gosbom@v0.81.1-0.20230627115839-1ff50c281391/DEVELOPING.md (about)

     1  # Developing
     2  
     3  ## Getting started
     4  
     5  In order to test and develop in this repo you will need the following dependencies installed:
     6  - Golang
     7  - docker
     8  - make
     9  
    10  After cloning the following step can help you get setup:
    11  1. run `make bootstrap` to download go mod dependencies, create the `/.tmp` dir, and download helper utilities.
    12  2. run `make` to view the selection of developer commands in the Makefile
    13  3. run `make build` to build the release snapshot binaries and packages
    14  4. for an even quicker start you can run `go run cmd/gosbom/main.go` to print the gosbom help.
    15  	- this command `go run cmd/gosbom/main.go alpine:latest` will compile and run gosbom against `alpine:latest`
    16  5. view the README or gosbom help output for more output options
    17  
    18  The main make tasks for common static analysis and testing are `lint`, `format`, `lint-fix`, `unit`, `integration`, and `cli`.
    19  
    20  See `make help` for all the current make tasks.
    21  
    22  ## Architecture
    23  
    24  Gosbom is used to generate a Software Bill of Materials (SBOM) from different kinds of input.
    25  
    26  ### Code organization for the cmd package
    27  
    28  Gosbom's entrypoint can be found in the `cmd` package at `cmd/gosbom/main.go`. `main.go` builds a new gosbom `cli` via `cli.New()` 
    29  and then executes the `cli` via `cli.Execute()`. The `cli` package is responsible for parsing command line arguments, 
    30  setting up the application context and configuration, and executing the application. Each of gosbom's commands 
    31  (e.g. `packages`, `attest`, `version`) are implemented as a `cobra.Command` in their respective `<command>.go` files. 
    32  They are registered in `gosbom/cli/commands/go`.
    33  ```
    34  .
    35  └── gosbom/
    36      ├── cli/
    37      │   ├── attest/
    38      │   ├── attest.go
    39      │   ├── commands.go
    40      │   ├── completion.go
    41      │   ├── convert/
    42      │   ├── convert.go
    43      │   ├── eventloop/
    44      │   ├── options/
    45      │   ├── packages/
    46      │   ├── packages.go
    47      │   ├── poweruser/
    48      │   ├── poweruser.go
    49      │   └── version.go
    50      └── main.go
    51  ```
    52  
    53  #### Execution flow
    54  
    55  ```mermaid
    56  sequenceDiagram
    57      participant main as cmd/gosbom/main
    58      participant cli as cli.New()
    59      participant root as root.Execute()
    60      participant cmd as <command>.Execute()
    61  
    62      main->>+cli: 
    63  
    64      Note right of cli: wire ALL CLI commands
    65      Note right of cli: add flags for ALL commands
    66  
    67      cli-->>-main:  root command 
    68  
    69      main->>+root: 
    70      root->>+cmd: 
    71      cmd-->>-root: (error)  
    72  
    73      root-->>-main: (error) 
    74  
    75      Note right of cmd: Execute SINGLE command from USER
    76  ```
    77  
    78  ### Code organization for gosbom library
    79  
    80  Gosbom's core library (see, exported) functionality is implemented in the `gosbom` package. The `gosbom` package is responsible for organizing the core
    81  SBOM data model, it's translated output formats, and the core SBOM generation logic.
    82  
    83  - analysis creates a static SBOM which can be encoded and decoded
    84  - format objects, should strive to not add or enrich data in encoding that could otherwise be done during analysis
    85  - package catalogers and their organization can be viewed/added to the `gosbom/pkg/cataloger` package 
    86  - file catalogers and their organization can be viewed/added to the `gosbom/file` package
    87  - The source package provides an abstraction to allow a user to loosely define a data source that can be cataloged
    88  
    89  #### Code example of gosbom as a library
    90  
    91  Here is a gist of using gosbom as a library to generate a SBOM for a docker image: [link](https://gist.github.com/wagoodman/57ed59a6d57600c23913071b8470175b).
    92  The execution flow for the example is detailed below.
    93  
    94  #### Execution flow examples for the gosbom library
    95  
    96  ```mermaid
    97  sequenceDiagram
    98      participant source as source.New(ubuntu:latest)
    99      participant sbom as sbom.SBOM
   100      participant catalog as gosbom.CatalogPackages(src)
   101      participant encoder as gosbom.Encode(sbom, format)
   102  
   103      Note right of source: use "ubuntu:latest" as SBOM input
   104  
   105      source-->>+sbom: add source to SBOM struct
   106      source-->>+catalog: pass src to generate catalog
   107      catalog-->-sbom: add cataloging results onto SBOM
   108      sbom-->>encoder: pass SBOM and format desiered to gosbom encoder
   109      encoder-->>source: return bytes that are the SBOM of the original input 
   110  
   111      Note right of catalog: cataloger configuration is done based on src
   112  ```
   113  
   114  
   115  ### Gosbom Catalogers
   116  
   117  ##### Summary
   118  
   119  Catalogers are the way in which gosbom is able to identify and construct packages given some amount of source metadata.
   120  For example, Gosbom can locate and process `package-lock.json` files when performing filesystem scans. 
   121  See: [how to specify file globs](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/javascript/cataloger.go#L16-L21)
   122  and an implementation of the [package-lock.json parser](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/javascript/cataloger.go#L16-L21) for a quick review.
   123  
   124  From a high level catalogers have the following properties:
   125  
   126  - They are independent from one another. The java cataloger has no idea of the processes, assumptions, or results of the python cataloger, for example.
   127  
   128  - They do not know what source is being analyzed. Are we analyzing a local directory? an image? if so, the squashed representation or all layers? The catalogers do not know the answers to these questions. Only that there is an interface to query for file paths and contents from an underlying "source" being scanned.
   129  
   130  - Packages created by the cataloger should not be mutated after they are created. There is one exception made for adding CPEs to a package after the cataloging phase, but that will most likely be moved back into the cataloger in the future.
   131  
   132  #### Building a new Cataloger
   133  
   134  Catalogers must fulfill the interface [found here](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger.go). 
   135  This means that when building a new cataloger, the new struct must implement both method signatures of `Catalog` and `Name`.
   136  
   137  A top level view of the functions that construct all the catalogers can be found [here](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/cataloger.go).
   138  When an author has finished writing a new cataloger this is the spot to plug in the new catalog constructor.
   139  
   140  For a top level view of how the catalogers are used see [this function](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/catalog.go#L41-L100) as a reference. It ranges over all catalogers passed as an argument and invokes the `Catalog` method:
   141  
   142  Each cataloger has its own `Catalog` method, but this does not mean that they are all vastly different.
   143  Take a look at the `apkdb` cataloger for alpine to see how it [constructs a generic.NewCataloger](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/apkdb/cataloger.go).
   144  
   145  `generic.NewCataloger` is an abstraction gosbom uses to make writing common components easier. First, it takes the `catalogerName` to identify the cataloger.
   146  On the other side of the call it uses two key pieces which inform the cataloger how to identify and return packages, the `globPatterns` and the `parseFunction`:
   147  - The first piece is a `parseByGlob` matching pattern used to identify the files that contain the package metadata.
   148  See [here for the APK example](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/apk_metadata.go#L16-L41).
   149  - The other is a `parseFunction` which informs the cataloger what to do when it has found one of the above matches files.
   150  See this [link for an example](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/apkdb/parse_apk_db.go#L22-L102).
   151  
   152  If you're unsure about using the `Generic Cataloger` and think the use case being filled requires something more custom
   153  just file an issue or ask in our slack, and we'd be more than happy to help on the design.
   154  
   155  Identified packages share a common struct so be sure that when the new cataloger is constructing a new package it is using the [`Package` struct](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/package.go#L16-L31).
   156  
   157  Metadata Note: Identified packages are also assigned specific metadata that can be unique to their environment. 
   158  See [this folder](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg) for examples of the different metadata types.
   159  These are plugged into the `MetadataType` and `Metadata` fields in the above struct. `MetadataType` informs which type is being used. `Metadata` is an interface converted to that type.
   160  
   161  Finally, here is an example of where the package construction is done in the apk cataloger. The first link is where `newPackage` is called in the `parseFunction`. The second link shows the package construction:
   162  - [Call for new package](https://github.com/nextlinux/gosbom/blob/v0.70.0/gosbom/pkg/cataloger/apkdb/parse_apk_db.go#L106)
   163  - [APK Package Constructor](https://github.com/nextlinux/gosbom/tree/v0.70.0/gosbom/pkg/cataloger/apkdb/package.go#L12-L27)
   164  
   165  If you have more questions about implementing a cataloger or questions about one you might be currently working
   166  always feel free to file an issue or reach out to us [on slack](https://anchore.com/slack).
   167  
   168  #### Searching for files
   169  
   170  All catalogers are provided an instance of the [`file.Resolver`](https://github.com/nextlinux/gosbom/blob/v0.70.0/gosbom/source/file_resolver.go#L8) to interface with the image and search for files. The implementations for these 
   171  abstractions leverage [`stereoscope`](https://github.com/anchore/stereoscope) in order to perform searching. Here is a 
   172  rough outline how that works:
   173  
   174  1. a stereoscope `file.Index` is searched based on the input given (a path, glob, or MIME type). The index is relatively fast to search, but requires results to be filtered down to the files that exist in the specific layer(s) of interest. This is done automatically by the `filetree.Searcher` abstraction. This abstraction will fallback to searching directly against the raw `filetree.FileTree` if the index does not contain the file(s) of interest. Note: the `filetree.Searcher` is used by the `file.Resolver` abstraction.
   175  2. Once the set of files are returned from the `filetree.Searcher` the results are filtered down further to return the most unique file results. For example, you may have requested for files by a glob that returns multiple results. These results are filtered down to deduplicate by real files, so if a result contains two references to the same file, say one accessed via symlink and one accessed via the real path, then the real path reference is returned and the symlink reference is filtered out. If both were accessed by symlink then the first (by lexical order) is returned. This is done automatically by the `file.Resolver` abstraction.
   176  3. By the time results reach the `pkg.Cataloger` you are guaranteed to have a set of unique files that exist in the layer(s) of interest (relative to what the resolver supports).
   177  
   178  ## Testing
   179  
   180  ### Levels of testing
   181  
   182  - `unit`: The default level of test which is distributed throughout the repo are unit tests. Any `_test.go` file that 
   183    does not reside somewhere within the `/test` directory is a unit test. Other forms of testing should be organized in 
   184    the `/test` directory. These tests should focus on correctness of functionality in depth. % test coverage metrics 
   185    only considers unit tests and no other forms of testing.
   186  
   187  - `integration`: located within `test/integration`, these tests focus on the behavior surfaced by the common library 
   188    entrypoints from the `gosbom` package and make light assertions about the results surfaced. Additionally, these tests
   189    tend to make diversity assertions for enum-like objects, ensuring that as enum values are added to a definition
   190    that integration tests will automatically fail if no test attempts to use that enum value. For more details see 
   191    the "Data diversity and freshness assertions" section below.
   192  
   193  - `cli`: located with in `test/cli`, these are tests that test the correctness of application behavior from a 
   194    snapshot build. This should be used in cases where a unit or integration test will not do or if you are looking
   195    for in-depth testing of code in the `cmd/` package (such as testing the proper behavior of application configuration,
   196    CLI switches, and glue code before gosbom library calls).
   197  
   198  - `acceptance`: located within `test/compare` and `test/install`, these are smoke-like tests that ensure that application  
   199    packaging and installation works as expected. For example, during release we provide RPM packages as a download 
   200    artifact. We also have an accompanying RPM acceptance test that installs the RPM from a snapshot build and ensures the 
   201    output of a gosbom invocation matches canned expected output. New acceptance tests should be added for each release artifact
   202    and architecture supported (when possible).
   203  
   204  ### Data diversity and freshness assertions
   205  
   206  It is important that tests against the codebase are flexible enough to begin failing when they do not cover "enough"
   207  of the objects under test. "Cover" in this case does not mean that some percentage of the code has been executed 
   208  during testing, but instead that there is enough diversity of data input reflected in testing relative to the
   209  definitions available.
   210  
   211  For instance, consider an enum-like value like so:
   212  ```go
   213  type Language string
   214  
   215  const (
   216    Java            Language = "java"
   217    JavaScript      Language = "javascript"
   218    Python          Language = "python"
   219    Ruby            Language = "ruby"
   220    Go              Language = "go"
   221  )
   222  ```
   223  
   224  Say we have a test that exercises all the languages defined today:
   225  
   226  ```go
   227  func TestCatalogPackages(t *testing.T) {
   228    testTable := []struct {
   229      // ... the set of test cases that test all languages
   230    }
   231    for _, test := range cases {
   232      t.Run(test.name, func (t *testing.T) {
   233        // use inputFixturePath and assert that gosbom.CatalogPackages() returns the set of expected Package objects
   234        // ...
   235      })
   236    }
   237  }
   238  ```
   239  
   240  Where each test case has a `inputFixturePath` that would result with packages from each language. This test is
   241  brittle since it does not assert that all languages were exercised directly and future modifications (such as 
   242  adding a new language) won't be covered by any test cases.
   243  
   244  To address this the enum-like object should have a definition of all objects that can be used in testing:
   245  
   246  ```go
   247  type Language string
   248  
   249  // const( Java Language = ..., ... )
   250  
   251  var AllLanguages = []Language{
   252  	Java,
   253  	JavaScript,
   254  	Python,
   255  	Ruby,
   256  	Go,
   257  	Rust,
   258  }
   259  ```
   260  
   261  Allowing testing to automatically fail when adding a new language:
   262  
   263  ```go
   264  func TestCatalogPackages(t *testing.T) {
   265    testTable := []struct {
   266    	// ... the set of test cases that (hopefully) covers all languages
   267    }
   268  
   269    // new stuff...
   270    observedLanguages := strset.New()
   271    
   272    for _, test := range cases {
   273      t.Run(test.name, func (t *testing.T) {
   274        // use inputFixturePath and assert that gosbom.CatalogPackages() returns the set of expected Package objects
   275      	// ...
   276      	
   277      	// new stuff...
   278      	for _, actualPkg := range actual {
   279          observedLanguages.Add(string(actualPkg.Language))
   280      	}
   281      	
   282      })
   283    }
   284  
   285     // new stuff...
   286    for _, expectedLanguage := range pkg.AllLanguages {
   287      if 	!observedLanguages.Contains(expectedLanguage) {
   288        t.Errorf("failed to test language=%q", expectedLanguage)	
   289      }
   290    }
   291  }
   292  ```
   293  
   294  This is a better test since it will fail when someone adds a new language but fails to write a test case that should
   295  exercise that new language. This method is ideal for integration-level testing, where testing correctness in depth 
   296  is not needed (that is what unit tests are for) but instead testing in breadth to ensure that units are well integrated.
   297  
   298  A similar case can be made for data freshness; if the quality of the results will be diminished if the input data
   299  is not kept up to date then a test should be written (when possible) to assert any input data is not stale.
   300  
   301  An example of this is the static list of licenses that is stored in `internal/spdxlicense` for use by the SPDX 
   302  presenters. This list is updated and published periodically by an external group and gosbom can grab and update this
   303  list by running `go generate ./...` from the root of the repo.
   304  
   305  An integration test has been written to grabs the latest license list version externally and compares that version
   306  with the version generated in the codebase. If they differ, the test fails, indicating to someone that there is an
   307  action needed to update it.
   308  
   309  **_The key takeaway is to try and write tests that fail when data assumptions change and not just when code changes.**_
   310  
   311  ### Snapshot tests
   312  
   313  The format objects make a lot of use of "snapshot" testing, where you save the expected output bytes from a call into the
   314  git repository and during testing make a comparison of the actual bytes from the subject under test with the golden
   315  copy saved in the repo. The "golden" files are stored in the `test-fixtures/snapshot` directory relative to the go 
   316  package under test and should always be updated by invoking `go test` on the specific test file with a specific CLI 
   317  update flag provided.
   318  
   319  Many of the `Format` tests make use of this approach, where the raw SBOM report is saved in the repo and the test 
   320  compares that SBOM with what is generated from the latest presenter code. For instance, at the time of this writing 
   321  the CycloneDX presenter snapshots can be updated by running:
   322  
   323  ```bash
   324  go test ./internal/formats -update-cyclonedx
   325  ```
   326  
   327  These flags are defined at the top of the test files that have tests that use the snapshot files.
   328  
   329  Snapshot testing is only as good as the manual verification of the golden snapshot file saved to the repo! Be careful 
   330  and diligent when updating these files.
   331  
   332