github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2021-08-10-tar-append.md (about)

     1  ---
     2  layout: post
     3  title:  "Go: append a file to a TAR archive"
     4  date:   2021-08-10 14:46:10 +0200
     5  author: Vladimir Markelov
     6  categories: golang archive tar
     7  ---
     8  
     9  ## The problem
    10  
    11  AIStore supports a whole gamut of "archival" operations that allow to read, write, and list archives such as .tar, .tgz, and .zip. When we started working on **appending** content to existing archives, we quickly discovered that, surprisingly, the corresponding open source appears to be missing. Standard Go packages - e.g., [archive/tar](https://pkg.go.dev/archive/tar) - fully support creating and reading archives but _not_ appending to an existing one...
    12  
    13  Looking for a solution on the Internet did not help - snippets of an open code that we could find did not work or worked only under certain restricted conditions.
    14  
    15  In this text, we show how to append a file to an existing TAR. GitHub references are included below.
    16  
    17  ## First attempts
    18  
    19  The first idea was to open an archive for appending and write new data at the end.
    20  It did not work: a new file was missing in the archive list and the appended file was inaccessible.
    21  TAR specification states:
    22  
    23  > A tar archive consists of a series of 512-byte records.
    24  > Each file system object requires a header record which stores basic metadata (pathname, owner, permissions, etc.) and zero or more records containing any file data.
    25  > The end of the archive is indicated by two records consisting entirely of zero bytes.
    26  
    27  Every TAR archive ends with an end of archive marker (a trailer): 2 zero blocks at the end.
    28  Any information written after the trailer is ignored.
    29  It made clear that a header and data of a new file had to overwrite the trailing zero blocks.
    30  As the trailer size was 2 records, it seemed sufficient to start writing the new data with 1 KiB offset from the end of the archive.
    31  A solution found on the Internet [employed this idea](https://stackoverflow.com/questions/18323995/golang-append-file-to-an-existing-tar-archive):
    32  
    33  ```go
    34  const recordSize = 512
    35  var data []byte
    36  f, err := os.OpenFile("test.tar", os.O_RDWR, os.ModePerm)
    37  if err != nil {
    38      log.Fatalln(err)
    39  }
    40  if _, err = f.Seek(-2 * recordSize, io.SeekEnd); err != nil {
    41      log.Fatalln(err)
    42  }
    43  tw := tar.NewWriter(f)
    44  hdr := &tar.Header{
    45      Name: "new_file",
    46      Size: int64(len(data)),
    47  }
    48  if err := tw.WriteHeader(hdr); err != nil {
    49      log.Fatalln(err)
    50  }
    51  if _, err := tw.Write(data); err != nil {
    52      log.Fatalln(err)
    53  }
    54  tw.Close()
    55  f.Close()
    56  ```
    57  
    58  But the story did not end here. It worked fine only with TAR's created with Go standard library.
    59  When I tried to append a new file to an archive created with a system `tar` utility, it failed:
    60  the appended file was missing again.
    61  
    62  ## The solution
    63  
    64  Digging into the trouble, I discovered that the number of zero blocks in the archive trailer depended on TAR version and defaults.
    65  Go package added only 1 KiB of zeros, but the archive created with system `tar` had more than 4 KiB zeroes at the end.
    66  That was why the first way did not work with an arbitrary TAR archive.
    67  TAR did not seem to store information about trailer anywhere, so I had to calculate the size of the trailer somehow.
    68  My final solution was inefficient for archives with a lot of files, yet it was reliable and it worked with any TAR archive:
    69  
    70  1. Open an archive.
    71  2. Pass its file handle to a TAR reader.
    72  3. Iterate through all files inside the archive until `io.EOF` is reached.
    73  4. For each file the TAR reader reports the file size, and the file pointer returns the position from which the file starts.
    74  5. When TAR reader returns `io.EOF`, the file pointer is already beyond the zero trailer. So we have to use numbers from the previous iteration to calculate the end of archive data.
    75  
    76  A tricky thing that the next archive entry must be written from the position aligned to TAR record boundary - 512 bytes.
    77  So the file size must be rounded up to the nearest multiple of TAR record size.
    78  
    79  ```go
    80  const recordSize = 512
    81  var data []byte
    82  fh, err := os.OpenFile("test.tar", os.O_RDWR, os.ModePerm)
    83  if err != nil {
    84      log.Fatalln(err)
    85  }
    86  var (
    87  	lastPos, lastSize int64
    88  	err error
    89  )
    90  twr := tar.NewReader(fh)
    91  for {
    92  	st, err := twr.Next()
    93  	if err != nil {
    94  		if err == io.EOF {
    95  			break
    96  		}
    97  		log.Fatalln(err)
    98  	}
    99  	if lastPos, err = fh.Seek(0, io.SeekCurrent); err != nil {
   100  		log.Fatalln(err)
   101  	}
   102  	lastSize = st.Size
   103  }
   104  // Round up the size of the last file to multiple of recordSize
   105  paddedSize := ((lastSize - 1) / recordSize + 1) * recordSize
   106  if _, err = fh.Seek(lastPos+paddedSize, io.SeekStart); err != nil {
   107  	log.Fatalln(err)
   108  }
   109  
   110  tw := tar.NewWriter(f)
   111  hdr := &tar.Header{
   112      Name: "new_file",
   113      Size: int64(len(data)),
   114  }
   115  if err = tw.WriteHeader(hdr); err != nil {
   116      log.Fatalln(err)
   117  }
   118  if _, err = tw.Write(data); err != nil {
   119      log.Fatalln(err)
   120  }
   121  tw.Close()
   122  fh.Close()
   123  ```
   124  
   125  ## References
   126  
   127  For the latest code, please see:
   128  
   129  - The function `OpenTarForAppend` in ["cos" package](https://github.com/NVIDIA/aistore/blob/main/cmn/cos/archive.go).
   130  - Example of how to use `OpenTarForAppend` in the implementation of the function `appendToArch` in the [core package](https://github.com/NVIDIA/aistore/blob/main/ais/tgtobj.go).