github.com/filecoin-project/lassie@v0.23.0/README.md (about) 1 # Lassie 2 3 > Fetches from Filecoin, every time 4 5 ## Table of Contents 6 7 * [Overview](#overview) 8 * [Installation](#installation) 9 * [Methods of Retrieval](#methods-of-retrieval) 10 * [Command Line Interface](#command-line-interface) 11 * [Extracting Content from a CAR](#extracting-content-from-a-car) 12 * [Fetch Example](#fetch-example) 13 * [HTTP API](#http-api) 14 * [Daemon Example](#daemon-example) 15 * [Golang Library](#golang-library) 16 * [Roots, pieces and payloads](#roots-pieces-and-payloads) 17 * [Contribute](#contribute) 18 * [License](#license) 19 20 ## Overview 21 22 Lassie is a simple retrieval client for Filecoin. It finds and fetches your data over the best retrieval protocols available. Lassie makes Filecoin retrieval. 23 24 ## Installation 25 26 Download the [lassie binary form the latest release](https://github.com/filecoin-project/lassie/releases/latest) based on your system architecture, or download and install the [lassie](https://github.com/filecoin-project/lassie) package using the Go package manager: 27 28 ```bash 29 $ go install github.com/filecoin-project/lassie/cmd/lassie@latest 30 31 go: downloading github.com/filecoin-project/lassie v0.3.1 32 go: downloading github.com/libp2p/go-libp2p v0.23.2 33 go: downloading github.com/filecoin-project/go-state-types v0.9.9 34 35 ... 36 ``` 37 38 Optionally, download the [go-car binary from the latest release](https://github.com/ipld/go-car/releases/latest) based on your system architecture, or install the [go-car](https://github.com/ipld/go-car) package using the Go package manager: 39 40 ```bash 41 $ go install github.com/ipld/go-car/cmd/car@latest 42 43 go: downloading github.com/ipld/go-car v0.6.0 44 go: downloading github.com/ipld/go-car/cmd v0.0.0-20230215023242-a2a8d2f9f60f 45 go: downloading github.com/ipld/go-codec-dagpb v1.6.0 46 47 ... 48 ``` 49 50 The go-car package makes it easier to work with files in the content-addressed archive (CAR) format, which is what Lassie uses to return the content it fetches. For the lassie use-case, go-car will be used to extract the contents of the CAR into usable files. 51 52 ## Methods of Retrieval 53 54 ### Command Line Interface 55 56 The lassie command line interface (CLI) is the simplest way to retrieve content from the Filecoin/IPFS network. The CLI is best used when needing to fetch content from the network on an ad-hoc basis. The CLI is also useful for testing and debugging purposes, such as making sure that a CID is retrievable from the network or from a specific provider. 57 58 The CLI can be used to retrieve content from the network by passing a CID to the `lassie fetch` command: 59 60 ```bash 61 $ lassie fetch [-o <output file>] [-t <timeout>] <CID>[/path/to/content] 62 ``` 63 64 The `lassie fetch` command will return the content of the CID to a file in the current working directory by the name of `<CID>.car`. If the `-o` output flag is used, the content will be written to the specified file. If the `-t` timeout flag is used, the timeout will be set to the specified value. The default timeout is 20 seconds. 65 66 `fetch` will also take as input [IPFS Trustless Gateway](https://specs.ipfs.tech/http-gateways/trustless-gateway/) style paths. If the CID is prefixed with `/ipfs/`, the remainder will be interpreted as a URL query, accepting query parameters that the Trustless Gateway spec accepts, including `dag-scope=`, `entity-bytes=`. For example, `lassie fetch '/ipfs/<CID>/path/to/content?dag-scope=all'` will fetch the CID, the blocks required to navigate the path, and all the content at the terminus of the path. 67 68 More information about available flags can be found by running `lassie fetch --help`. 69 70 #### Extracting Content from a CAR 71 72 The go-car package can be used to extract the contents of the CAR file into usable files. For example, if the content of the CID is a video, the go-car package can be used to extract the video into a file on the local filesystem. 73 74 ```bash 75 $ car extract -f <CID>.car 76 ``` 77 78 The `-f` flag is used to specify the CAR file to extract the contents from. The contents of the CAR will be extracted into the current working directory. 79 80 #### Fetch Example 81 82 Let's grab some content from the Filecoin/IPFS network using the `lassie fetch` command: 83 84 ```bash 85 $ lassie fetch -o fetch-example.car -p bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4 86 ``` 87 88 This will fetch the `bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4` CID from the network and save it to a file named `fetch-example.car` in our current working directory. 89 90 The `-p` progress flag is used to get more detailed information about the state of the retrieval. 91 92 _Note: If you received a timeout issue, try using the `-t` flag to increase your timeout time to something longer than 20 seconds. Retrievability of some CIDs is highly variable on local network characteristics._ 93 94 _Note: For the internet cautious out there, the `bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4` CID is a directory that has a video titled `birb.mp4`, which is a video of a bird bouncing to the song "Around the World" by Daft Punk. We've been using it internally during the development of Lassie to test with._ 95 96 To extract the contents of the `fetch-example.car` file we created in the previous example, we would run: 97 98 ```bash 99 $ car extract -f fetch-example.car 100 ``` 101 102 To fetch and extract at the same time, we can use the `lassie fetch` command and pipe the output to the `car extract` command: 103 104 ```bash 105 $ lassie fetch -o - -p bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4 | car extract 106 ``` 107 108 The `-o` output flag is used with the `-` character to specify that the output should be written to `stdout`. The `car extract` command reads input via `stdin` by default, so the output of the `lassie fetch` command is piped to the `car extract` command. 109 110 You should now have a `birb.mp4` file in your current working directory. Feel free to play it with your favorite video player! 111 112 ### HTTP API 113 114 The lassie HTTP API allows one to run a web server that can be used to retrieve content from the Filecoin/IPFS network via HTTP requests. The HTTP API is best used when needing to retrieve content from the network via HTTP requests, whether that be from a browser or a programmatic tool like `curl`. We will be using `curl` for the following examples but know that any HTTP client can be used including a web browser. Curl specific behavior will be noted when applicable. 115 116 The API server can be started with the `lassie daemon` command: 117 118 ```bash 119 $ lassie daemon 120 121 Lassie daemon listening on address 127.0.0.1:41443 122 Hit CTRL-C to stop the daemon 123 ``` 124 125 The port can be changed by using the `-p` port flag. Any available port will be used by default. 126 127 More information about available flags can be found by running `lassie daemon --help`. 128 129 To fetch content using the HTTP API, make a `GET` request to the `/ipfs/<CID>[/path/to/content]` endpoint: 130 131 ```bash 132 $ curl http://127.0.0.1:41443/ipfs/<CID>[/path/to/content] 133 ``` 134 135 By default, this will output the contents of the CID to `stdout`. 136 137 To save the output to a file, use the `filename` query parameter: 138 139 ```bash 140 $ curl http://127.0.0.1:41443/ipfs/<CID>[/path/to/content]?filename=<filename> --output <filename> 141 ``` 142 143 _CURL Note: With curl we need to also specify the `--output <filename>` option. However, putting the above URL into a browser will download the file with the given filename parameter value upon a successful fetch._ 144 145 More information about HTTP API requests and responses, as well as the numerous request parameters that can be used to control fetch behavior on a per request basis, can be found in the [HTTP Specification](./docs/HTTP_SPEC.md) document. 146 147 #### Daemon Example 148 149 We can start the lassie daemon by running: 150 151 ```bash 152 $ lassie daemon 153 154 Lassie daemon listening on address 127.0.0.1:41443 155 Hit CTRL-C to stop the daemon 156 ``` 157 158 We can now fetch the same content we did in the [CLI example](#fetch-example) by running: 159 160 ```bash 161 $ curl http://127.0.0.1:41443/ipfs/bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4?filename=daemon-example.car --output daemon-example.car 162 ``` 163 164 _CURL Note: With curl we need to also specify the `--output <filename>` option. However, putting the above URL into a browser will download the file with the given filename parameter value upon a successful fetch._ 165 166 To extract the contents of the `daemon-example.car` file we created in the above example, we would run: 167 168 ```bash 169 $ car extract -f daemon-example.car 170 ``` 171 172 ### Golang Library 173 174 The lassie library allows one to integrate lassie into their own Go programs. The library is best used when needing to retrieve content from the network programmatically. 175 176 The lassie dependency can be added to a project with the following command: 177 178 ```bash 179 $ go install github.com/filecoin-project/lassie/cmd/lassie@latest 180 ``` 181 182 The lassie library can then be imported into a project with the following import statement: 183 184 ```go 185 import "github.com/filecoin-project/lassie/pkg/lassie" 186 ``` 187 188 The following code shows a small example for how to use the lassie library to fetch a CID: 189 190 ```go 191 package main 192 193 import ( 194 "context" 195 "fmt" 196 "os" 197 198 "github.com/filecoin-project/lassie/pkg/lassie" 199 "github.com/filecoin-project/lassie/pkg/storage" 200 "github.com/filecoin-project/lassie/pkg/types" 201 "github.com/ipfs/go-cid" 202 trustlessutils "github.com/ipld/go-trustless-utils" 203 ) 204 205 // main creates a default lassie instance and fetches a CID 206 func main() { 207 ctx := context.Background() 208 209 // Create a default lassie instance 210 lassie, err := lassie.NewLassie(ctx) 211 if err != nil { 212 panic(err) 213 } 214 215 // Prepare the fetch 216 rootCid := cid.MustParse("bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4") // The CID to fetch 217 store := storage.NewDeferredStorageCar(os.TempDir(), rootCid) // The place to put the CAR file 218 request, err := types.NewRequestForPath(store, rootCid, "", trustlessutils.DagScopeAll, nil) // The fetch request 219 if err != nil { 220 panic(err) 221 } 222 223 // Fetch the CID 224 stats, err := lassie.Fetch(ctx, request) 225 if err != nil { 226 panic(err) 227 } 228 229 // Print the stats 230 fmt.Printf("Fetched %d blocks in %d bytes\n", stats.Blocks, stats.Size) 231 } 232 233 ``` 234 235 Let's break down the above code. 236 237 First, we create a default lassie instance: 238 239 ```go 240 ctx := context.Background() 241 242 // Create a default lassie instance 243 lassie, err := lassie.NewLassie(ctx) 244 if err != nil { 245 panic(err) 246 } 247 ``` 248 249 The `NewLassie` function creates a new lassie instance with default settings, taking a `context.Context`. The context is used to control the lifecycle of the lassie instance. The function returns a `*Lassie` instance and an `error`. The `*Lassie` instance is used to make fetch requests. The `error` is used to indicate if there was an error creating the lassie instance. 250 251 Additionally, the `NewLassie` function takes a variable number of `LassieOption`s. These options can be used to customize the lassie instance. For example, the `WithGlobalTimeout` option can be used to set a global timeout for all fetch requests made with the lassie instance. More information about the available options can be found in the [lassie.go](https://pkg.go.dev/github.com/filecoin-project/lassie/pkg/lassie) file. 252 253 Next, we prepare the fetch request: 254 255 ```go 256 // Prepare the fetch 257 rootCid := cid.MustParse("bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4") // The CID to fetch 258 store := storage.NewDeferredStorageCar(os.TempDir(), rootCid) // The place to put the CAR file 259 request, err := types.NewRequestForPath(store, rootCid, "", trustlessutils.DagScopeAll, nil) // The fetch request 260 if err != nil { 261 panic(err) 262 } 263 ``` 264 265 The `rootCid` is the CID we want to fetch. The `store` is where we want to write the car file. In this case we are choosing to store it in the OS's temp directory. The `request` is the resulting fetch request that we'll hand to the `lassie.Fetch` function. 266 267 The `request` is created using the `NewRequestForPath` function. The only new information that this function takes that we haven't discussed is the `path` and the `dagScope`. The `path` is an optional path string to a file in the CID being requested. In this case we don't have a path, so pass an empty string. The `dagScope` has to do with traversal and describes the shape of the DAG fetched at the terminus of the specified path whose blocks are included in the returned CAR file after the blocks required to traverse path segments. More information on `dagScope` can be found in the [dag-scope HTTP Specification](./docs/HTTP_SPEC.md#dag-scope-request-query-parameter) section. In this case we use `trustlessutils.DagScopeAll` to specify we want everything from the root CID onward. 268 269 The function returns a `*types.Request` and an `error`. The `*types.Request` is the resulting fetch request we'll pass to `lassie.Fetch`, and the `error` is used to indicate if there was an error creating the fetch request. 270 271 Finally, we fetch the CID: 272 273 ```go 274 // Fetch the CID 275 stats, err := lassie.Fetch(ctx, request) 276 if err != nil { 277 panic(err) 278 } 279 ``` 280 281 The `Fetch` function takes a `context.Context`, a `*types.Request`, and a `*types.FetchOptions`. The `context.Context` is used to control the lifecycle of the fetch. The `*types.Request` is the fetch request we made above. The `*types.FetchOptions` is used to control the behavior of the fetch, but it's variadic, so we don't pass anything. The function returns a `*types.FetchStats` and an `error`. The `*types.FetchStats` is the fetch stats. The `error` is used to indicate if there was an error fetching the CID. 282 283 ### Roots, pieces and payloads 284 285 Lassie uses the term **Root** to refer to the head block of a potential graph (DAG) of IPLD blocks. This is typically the block you request, using its CID, when you perform a _fetch_ with Lassie. Of course a root could also be a sub-root of a larger graph, but when performing a retrieval with Lassie, you are focusing on the graph underneath the block you are fetching, and considerations of larger DAGs are not relevant. 286 287 In the Filecoin ecosystem, there exists terminology related to "pieces" and "payloads" and there may be confusion between the way lassie uses the term "root CID" and some of the language used in Filecoin. A **Piece** is a Filecoin storage deal unit, typically containing user data organized into a CAR; then padded to size to form a portion of a Filecoin sector. Filecoin pieces have their own CIDs, and it is possible to retrieve a whole, raw piece, from Filecoin. This can lead to terminology such as "piece root CID". Lassie currently does not perform whole-piece retrievals, and is not intended to be able to handle piece CIDs. Additionally, in Filecoin the term **Payload** is sometimes used in reference to the IPLD data inside a piece when performing a storage or retrieval deal. This is closer to the way Lassie uses the term **Root** and historical Lassie code contains some references to "payloads" that are actually referring to the root CID of a graph. 288 289 ## Contribute 290 291 Early days PRs are welcome! 292 293 ## License 294 295 This library is dual-licensed under Apache 2.0 and MIT terms. 296 297 Copyright 2022. Protocol Labs, Inc.