github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/_posts/2021-08-10-tar-append.md (about) 1 --- 2 layout: post 3 title: "Go: append a file to a TAR archive" 4 date: 2021-08-10 14:46:10 +0200 5 author: Vladimir Markelov 6 categories: golang archive tar 7 --- 8 9 ## The problem 10 11 AIStore supports a whole gamut of "archival" operations that allow to read, write, and list archives such as .tar, .tgz, and .zip. When we started working on **appending** content to existing archives, we quickly discovered that, surprisingly, the corresponding open source appears to be missing. Standard Go packages - e.g., [archive/tar](https://pkg.go.dev/archive/tar) - fully support creating and reading archives but _not_ appending to an existing one... 12 13 Looking for a solution on the Internet did not help - snippets of an open code that we could find did not work or worked only under certain restricted conditions. 14 15 In this text, we show how to append a file to an existing TAR. GitHub references are included below. 16 17 ## First attempts 18 19 The first idea was to open an archive for appending and write new data at the end. 20 It did not work: a new file was missing in the archive list and the appended file was inaccessible. 21 TAR specification states: 22 23 > A tar archive consists of a series of 512-byte records. 24 > Each file system object requires a header record which stores basic metadata (pathname, owner, permissions, etc.) and zero or more records containing any file data. 25 > The end of the archive is indicated by two records consisting entirely of zero bytes. 26 27 Every TAR archive ends with an end of archive marker (a trailer): 2 zero blocks at the end. 28 Any information written after the trailer is ignored. 29 It made clear that a header and data of a new file had to overwrite the trailing zero blocks. 30 As the trailer size was 2 records, it seemed sufficient to start writing the new data with 1 KiB offset from the end of the archive. 31 A solution found on the Internet [employed this idea](https://stackoverflow.com/questions/18323995/golang-append-file-to-an-existing-tar-archive): 32 33 ```go 34 const recordSize = 512 35 var data []byte 36 f, err := os.OpenFile("test.tar", os.O_RDWR, os.ModePerm) 37 if err != nil { 38 log.Fatalln(err) 39 } 40 if _, err = f.Seek(-2 * recordSize, io.SeekEnd); err != nil { 41 log.Fatalln(err) 42 } 43 tw := tar.NewWriter(f) 44 hdr := &tar.Header{ 45 Name: "new_file", 46 Size: int64(len(data)), 47 } 48 if err := tw.WriteHeader(hdr); err != nil { 49 log.Fatalln(err) 50 } 51 if _, err := tw.Write(data); err != nil { 52 log.Fatalln(err) 53 } 54 tw.Close() 55 f.Close() 56 ``` 57 58 But the story did not end here. It worked fine only with TAR's created with Go standard library. 59 When I tried to append a new file to an archive created with a system `tar` utility, it failed: 60 the appended file was missing again. 61 62 ## The solution 63 64 Digging into the trouble, I discovered that the number of zero blocks in the archive trailer depended on TAR version and defaults. 65 Go package added only 1 KiB of zeros, but the archive created with system `tar` had more than 4 KiB zeroes at the end. 66 That was why the first way did not work with an arbitrary TAR archive. 67 TAR did not seem to store information about trailer anywhere, so I had to calculate the size of the trailer somehow. 68 My final solution was inefficient for archives with a lot of files, yet it was reliable and it worked with any TAR archive: 69 70 1. Open an archive. 71 2. Pass its file handle to a TAR reader. 72 3. Iterate through all files inside the archive until `io.EOF` is reached. 73 4. For each file the TAR reader reports the file size, and the file pointer returns the position from which the file starts. 74 5. When TAR reader returns `io.EOF`, the file pointer is already beyond the zero trailer. So we have to use numbers from the previous iteration to calculate the end of archive data. 75 76 A tricky thing that the next archive entry must be written from the position aligned to TAR record boundary - 512 bytes. 77 So the file size must be rounded up to the nearest multiple of TAR record size. 78 79 ```go 80 const recordSize = 512 81 var data []byte 82 fh, err := os.OpenFile("test.tar", os.O_RDWR, os.ModePerm) 83 if err != nil { 84 log.Fatalln(err) 85 } 86 var ( 87 lastPos, lastSize int64 88 err error 89 ) 90 twr := tar.NewReader(fh) 91 for { 92 st, err := twr.Next() 93 if err != nil { 94 if err == io.EOF { 95 break 96 } 97 log.Fatalln(err) 98 } 99 if lastPos, err = fh.Seek(0, io.SeekCurrent); err != nil { 100 log.Fatalln(err) 101 } 102 lastSize = st.Size 103 } 104 // Round up the size of the last file to multiple of recordSize 105 paddedSize := ((lastSize - 1) / recordSize + 1) * recordSize 106 if _, err = fh.Seek(lastPos+paddedSize, io.SeekStart); err != nil { 107 log.Fatalln(err) 108 } 109 110 tw := tar.NewWriter(f) 111 hdr := &tar.Header{ 112 Name: "new_file", 113 Size: int64(len(data)), 114 } 115 if err = tw.WriteHeader(hdr); err != nil { 116 log.Fatalln(err) 117 } 118 if _, err = tw.Write(data); err != nil { 119 log.Fatalln(err) 120 } 121 tw.Close() 122 fh.Close() 123 ``` 124 125 ## References 126 127 For the latest code, please see: 128 129 - The function `OpenTarForAppend` in ["cos" package](https://github.com/NVIDIA/aistore/blob/main/cmn/cos/archive.go). 130 - Example of how to use `OpenTarForAppend` in the implementation of the function `appendToArch` in the [core package](https://github.com/NVIDIA/aistore/blob/main/ais/tgtobj.go).