github.com/bodgit/sevenzip@v1.5.1/README.md

github.com/bodgit/sevenzip@v1.5.1/README.md (about)

     1  [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/bodgit/sevenzip/badge)](https://securityscorecards.dev/viewer/?uri=github.com/bodgit/sevenzip)
     2  [![OpenSSF Best Practices](https://www.bestpractices.dev/projects/6882/badge)](https://www.bestpractices.dev/projects/6882)
     3  [![GitHub release](https://img.shields.io/github/v/release/bodgit/sevenzip)](https://github.com/bodgit/sevenzip/releases)
     4  [![Build Status](https://img.shields.io/github/actions/workflow/status/bodgit/sevenzip/build.yml?branch=main)](https://github.com/bodgit/sevenzip/actions?query=workflow%3ABuild)
     5  [![Coverage Status](https://coveralls.io/repos/github/bodgit/sevenzip/badge.svg?branch=master)](https://coveralls.io/github/bodgit/sevenzip?branch=master)
     6  [![Go Report Card](https://goreportcard.com/badge/github.com/bodgit/sevenzip)](https://goreportcard.com/report/github.com/bodgit/sevenzip)
     7  [![GoDoc](https://godoc.org/github.com/bodgit/sevenzip?status.svg)](https://godoc.org/github.com/bodgit/sevenzip)
     8  ![Go version](https://img.shields.io/badge/Go-1.22-brightgreen.svg)
     9  ![Go version](https://img.shields.io/badge/Go-1.21-brightgreen.svg)
    10  
    11  # sevenzip
    12  
    13  A reader for 7-zip archives inspired by `archive/zip`.
    14  
    15  Current status:
    16  
    17  * Pure Go, no external libraries or binaries needed.
    18  * Handles uncompressed headers, (`7za a -mhc=off test.7z ...`).
    19  * Handles compressed headers, (`7za a -mhc=on test.7z ...`).
    20  * Handles password-protected versions of both of the above (`7za a -mhc=on|off -mhe=on -ppassword test.7z ...`).
    21  * Handles archives split into multiple volumes, (`7za a -v100m test.7z ...`).
    22  * Handles self-extracting archives, (`7za a -sfx archive.exe ...`).
    23  * Validates CRC values as it parses the file.
    24  * Supports ARM, BCJ, BCJ2, Brotli, Bzip2, Copy, Deflate, Delta, LZ4, LZMA, LZMA2, PPC, SPARC and Zstandard methods.
    25  * Implements the `fs.FS` interface so you can treat an opened 7-zip archive like a filesystem.
    26  
    27  More examples of 7-zip archives are needed to test all of the different combinations/algorithms possible.
    28  
    29  ## Frequently Asked Questions
    30  
    31  ### Why is my code running so slow?
    32  
    33  Someone might write the following simple code:
    34  ```golang
    35  func extractArchive(archive string) error {
    36          r, err := sevenzip.OpenReader(archive)
    37          if err != nil {
    38                  return err
    39          }
    40          defer r.Close()
    41  
    42          for _, f := range r.File {
    43                  rc, err := f.Open()
    44                  if err != nil {
    45                          return err
    46                  }
    47                  defer rc.Close()
    48  
    49                  // Extract the file
    50          }
    51  
    52          return nil
    53  }
    54  ```
    55  Unlike a zip archive where every file is individually compressed, 7-zip archives can have all of the files compressed together in one long compressed stream, supposedly to achieve a better compression ratio.
    56  In a naive random access implementation, to read the first file you start at the beginning of the compressed stream and read out that files worth of bytes.
    57  To read the second file you have to start at the beginning of the compressed stream again, read and discard the first files worth of bytes to get to the correct offset in the stream, then read out the second files worth of bytes.
    58  You can see that for an archive that contains hundreds of files, extraction can get progressively slower as you have to read and discard more and more data just to get to the right offset in the stream.
    59  
    60  This package contains an optimisation that caches and reuses the underlying compressed stream reader so you don't have to keep starting from the beginning for each file, but it does require you to call `rc.Close()` before extracting the next file.
    61  So write your code similar to this:
    62  ```golang
    63  func extractFile(file *sevenzip.File) error {
    64          rc, err := f.Open()
    65          if err != nil {
    66                  return err
    67          }
    68          defer rc.Close()
    69  
    70          // Extract the file
    71  
    72          return nil
    73  }
    74  
    75  func extractArchive(archive string) error {
    76          r, err := sevenzip.OpenReader(archive)
    77          if err != nil {
    78                  return err
    79          }
    80          defer r.Close()
    81  
    82          for _, f := range r.File {
    83                  if err = extractFile(f); err != nil {
    84                          return err
    85                  }
    86          }
    87  
    88          return nil
    89  }
    90  ```
    91  You can see the main difference is to not defer all of the `Close()` calls until the end of `extractArchive()`.
    92  
    93  There is a set of benchmarks in this package that demonstrates the performance boost that the optimisation provides, amongst other techniques:
    94  ```
    95  $ go test -v -run='^$' -bench='Reader$' -benchtime=60s
    96  goos: darwin
    97  goarch: amd64
    98  pkg: github.com/bodgit/sevenzip
    99  cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
   100  BenchmarkNaiveReader
   101  BenchmarkNaiveReader-12                  	       2	31077542628 ns/op
   102  BenchmarkOptimisedReader
   103  BenchmarkOptimisedReader-12              	     434	 164854747 ns/op
   104  BenchmarkNaiveParallelReader
   105  BenchmarkNaiveParallelReader-12          	     240	 361869339 ns/op
   106  BenchmarkNaiveSingleParallelReader
   107  BenchmarkNaiveSingleParallelReader-12    	     412	 171027895 ns/op
   108  BenchmarkParallelReader
   109  BenchmarkParallelReader-12               	     636	 112551812 ns/op
   110  PASS
   111  ok  	github.com/bodgit/sevenzip	472.251s
   112  ```
   113  The archive used here is just the reference LZMA SDK archive, which is only 1 MiB in size but does contain 630+ files split across three compression streams.
   114  The only difference between BenchmarkNaiveReader and the rest is the lack of a call to `rc.Close()` between files so the stream reuse optimisation doesn't take effect.
   115  
   116  Don't try and blindly throw goroutines at the problem either as this can also undo the optimisation; a naive implementation that uses a pool of multiple goroutines to extract each file ends up being nearly 50% slower, even just using a pool of one goroutine can end up being less efficient.
   117  The optimal way to employ goroutines is to make use of the `sevenzip.FileHeader.Stream` field; extract files with the same value using the same goroutine.
   118  This achieves a 50% speed improvement with the LZMA SDK archive, but it very much depends on how many streams there are in the archive.
   119  
   120  In general, don't try and extract the files in a different order compared to the natural order within the archive as that will also undo the optimisation.
   121  The worst scenario would likely be to extract the archive in reverse order.