github.com/filecoin-project/lassie@v0.23.0/README.md (about)

     1  # Lassie
     2  
     3  > Fetches from Filecoin, every time
     4  
     5  ## Table of Contents
     6  
     7  * [Overview](#overview)
     8  * [Installation](#installation)
     9  * [Methods of Retrieval](#methods-of-retrieval)
    10  	* [Command Line Interface](#command-line-interface)
    11  		* [Extracting Content from a CAR](#extracting-content-from-a-car)
    12  		* [Fetch Example](#fetch-example)
    13  	* [HTTP API](#http-api)
    14  		* [Daemon Example](#daemon-example)
    15  	* [Golang Library](#golang-library)
    16  	* [Roots, pieces and payloads](#roots-pieces-and-payloads)
    17  * [Contribute](#contribute)
    18  * [License](#license)
    19  
    20  ## Overview
    21  
    22  Lassie is a simple retrieval client for Filecoin. It finds and fetches your data over the best retrieval protocols available. Lassie makes Filecoin retrieval.
    23  
    24  ## Installation
    25  
    26  Download the [lassie binary form the latest release](https://github.com/filecoin-project/lassie/releases/latest) based on your system architecture, or download and install the [lassie](https://github.com/filecoin-project/lassie) package using the Go package manager:
    27  
    28  ```bash
    29  $ go install github.com/filecoin-project/lassie/cmd/lassie@latest
    30  
    31  go: downloading github.com/filecoin-project/lassie v0.3.1
    32  go: downloading github.com/libp2p/go-libp2p v0.23.2
    33  go: downloading github.com/filecoin-project/go-state-types v0.9.9
    34  
    35  ...
    36  ```
    37  
    38  Optionally, download the [go-car binary from the latest release](https://github.com/ipld/go-car/releases/latest) based on your system architecture, or install the [go-car](https://github.com/ipld/go-car) package using the Go package manager:
    39  
    40  ```bash
    41  $ go install github.com/ipld/go-car/cmd/car@latest
    42  
    43  go: downloading github.com/ipld/go-car v0.6.0
    44  go: downloading github.com/ipld/go-car/cmd v0.0.0-20230215023242-a2a8d2f9f60f
    45  go: downloading github.com/ipld/go-codec-dagpb v1.6.0 
    46  
    47  ...
    48  ```
    49  
    50  The go-car package makes it easier to work with files in the content-addressed archive (CAR) format, which is what Lassie uses to return the content it fetches. For the lassie use-case, go-car will be used to extract the contents of the CAR into usable files.
    51  
    52  ## Methods of Retrieval
    53  
    54  ### Command Line Interface
    55  
    56  The lassie command line interface (CLI) is the simplest way to retrieve content from the Filecoin/IPFS network. The CLI is best used when needing to fetch content from the network on an ad-hoc basis. The CLI is also useful for testing and debugging purposes, such as making sure that a CID is retrievable from the network or from a specific provider.
    57  
    58  The CLI can be used to retrieve content from the network by passing a CID to the `lassie fetch` command:
    59  
    60  ```bash
    61  $ lassie fetch [-o <output file>] [-t <timeout>] <CID>[/path/to/content]
    62  ```
    63  
    64  The `lassie fetch` command will return the content of the CID to a file in the current working directory by the name of `<CID>.car`. If the `-o` output flag is used, the content will be written to the specified file. If the `-t` timeout flag is used, the timeout will be set to the specified value. The default timeout is 20 seconds.
    65  
    66  `fetch` will also take as input [IPFS Trustless Gateway](https://specs.ipfs.tech/http-gateways/trustless-gateway/) style paths. If the CID is prefixed with `/ipfs/`, the remainder will be interpreted as a URL query, accepting query parameters that the Trustless Gateway spec accepts, including `dag-scope=`, `entity-bytes=`. For example, `lassie fetch '/ipfs/<CID>/path/to/content?dag-scope=all'` will fetch the CID, the blocks required to navigate the path, and all the content at the terminus of the path.
    67  
    68  More information about available flags can be found by running `lassie fetch --help`.
    69  
    70  #### Extracting Content from a CAR
    71  
    72  The go-car package can be used to extract the contents of the CAR file into usable files. For example, if the content of the CID is a video, the go-car package can be used to extract the video into a file on the local filesystem.
    73  
    74  ```bash
    75  $ car extract -f <CID>.car
    76  ```
    77  
    78  The `-f` flag is used to specify the CAR file to extract the contents from. The contents of the CAR will be extracted into the current working directory.
    79  
    80  #### Fetch Example
    81  
    82  Let's grab some content from the Filecoin/IPFS network using the `lassie fetch` command:
    83  
    84  ```bash
    85  $ lassie fetch -o fetch-example.car -p bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4
    86  ```
    87  
    88  This will fetch the `bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4` CID from the network and save it to a file named `fetch-example.car` in our current working directory.
    89  
    90  The `-p` progress flag is used to get more detailed information about the state of the retrieval.
    91  
    92  _Note: If you received a timeout issue, try using the `-t` flag to increase your timeout time to something longer than 20 seconds. Retrievability of some CIDs is highly variable on local network characteristics._
    93  
    94  _Note: For the internet cautious out there, the `bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4` CID is a directory that has a video titled `birb.mp4`, which is a video of a bird bouncing to the song "Around the World" by Daft Punk. We've been using it internally during the development of Lassie to test with._
    95  
    96  To extract the contents of the `fetch-example.car` file we created in the previous example, we would run:
    97  
    98  ```bash
    99  $ car extract -f fetch-example.car
   100  ```
   101  
   102  To fetch and extract at the same time, we can use the `lassie fetch` command and pipe the output to the `car extract` command:
   103  
   104  ```bash
   105  $ lassie fetch -o - -p bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4 | car extract
   106  ```
   107  
   108  The `-o` output flag is used with the `-` character to specify that the output should be written to `stdout`. The `car extract` command reads input via `stdin` by default, so the output of the `lassie fetch` command is piped to the `car extract` command.
   109  
   110  You should now have a `birb.mp4` file in your current working directory. Feel free to play it with your favorite video player!
   111  
   112  ### HTTP API
   113  
   114  The lassie HTTP API allows one to run a web server that can be used to retrieve content from the Filecoin/IPFS network via HTTP requests. The HTTP API is best used when needing to retrieve content from the network via HTTP requests, whether that be from a browser or a programmatic tool like `curl`. We will be using `curl` for the following examples but know that any HTTP client can be used including a web browser. Curl specific behavior will be noted when applicable.
   115  
   116  The API server can be started with the `lassie daemon` command:
   117  
   118  ```bash
   119  $ lassie daemon
   120  
   121  Lassie daemon listening on address 127.0.0.1:41443
   122  Hit CTRL-C to stop the daemon
   123  ```
   124  
   125  The port can be changed by using the `-p` port flag. Any available port will be used by default.
   126  
   127  More information about available flags can be found by running `lassie daemon --help`.
   128  
   129  To fetch content using the HTTP API, make a `GET` request to the `/ipfs/<CID>[/path/to/content]` endpoint:
   130  
   131  ```bash
   132  $ curl http://127.0.0.1:41443/ipfs/<CID>[/path/to/content]
   133  ```
   134  
   135  By default, this will output the contents of the CID to `stdout`.
   136  
   137  To save the output to a file, use the `filename` query parameter:
   138  
   139  ```bash
   140  $ curl http://127.0.0.1:41443/ipfs/<CID>[/path/to/content]?filename=<filename> --output <filename>
   141  ```
   142  
   143  _CURL Note: With curl we need to also specify the `--output <filename>` option. However, putting the above URL into a browser will download the file with the given filename parameter value upon a successful fetch._
   144  
   145  More information about HTTP API requests and responses, as well as the numerous request parameters that can be used to control fetch behavior on a per request basis, can be found in the [HTTP Specification](./docs/HTTP_SPEC.md) document.
   146  
   147  #### Daemon Example
   148  
   149  We can start the lassie daemon by running:
   150  
   151  ```bash
   152  $ lassie daemon
   153  
   154  Lassie daemon listening on address 127.0.0.1:41443
   155  Hit CTRL-C to stop the daemon
   156  ```
   157  
   158  We can now fetch the same content we did in the [CLI example](#fetch-example) by running:
   159  
   160  ```bash
   161  $ curl http://127.0.0.1:41443/ipfs/bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4?filename=daemon-example.car --output daemon-example.car
   162  ```
   163  
   164  _CURL Note: With curl we need to also specify the `--output <filename>` option. However, putting the above URL into a browser will download the file with the given filename parameter value upon a successful fetch._
   165  
   166  To extract the contents of the `daemon-example.car` file we created in the above example, we would run:
   167  
   168  ```bash
   169  $ car extract -f daemon-example.car
   170  ```
   171  
   172  ### Golang Library
   173  
   174  The lassie library allows one to integrate lassie into their own Go programs. The library is best used when needing to retrieve content from the network programmatically.
   175  
   176  The lassie dependency can be added to a project with the following command:
   177  
   178  ```bash
   179  $ go install github.com/filecoin-project/lassie/cmd/lassie@latest
   180  ```
   181  
   182  The lassie library can then be imported into a project with the following import statement:
   183  
   184  ```go
   185  import "github.com/filecoin-project/lassie/pkg/lassie"
   186  ```
   187  
   188  The following code shows a small example for how to use the lassie library to fetch a CID:
   189  
   190  ```go
   191  package main
   192  
   193  import (
   194  	"context"
   195  	"fmt"
   196  	"os"
   197  
   198  	"github.com/filecoin-project/lassie/pkg/lassie"
   199  	"github.com/filecoin-project/lassie/pkg/storage"
   200  	"github.com/filecoin-project/lassie/pkg/types"
   201  	"github.com/ipfs/go-cid"
   202  	trustlessutils "github.com/ipld/go-trustless-utils"
   203  )
   204  
   205  // main creates a default lassie instance and fetches a CID
   206  func main() {
   207  	ctx := context.Background()
   208  
   209  	// Create a default lassie instance
   210  	lassie, err := lassie.NewLassie(ctx)
   211  	if err != nil {
   212  		panic(err)
   213  	}
   214  
   215  	// Prepare the fetch
   216  	rootCid := cid.MustParse("bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4")       // The CID to fetch
   217  	store := storage.NewDeferredStorageCar(os.TempDir(), rootCid)                                 // The place to put the CAR file
   218  	request, err := types.NewRequestForPath(store, rootCid, "", trustlessutils.DagScopeAll, nil)  // The fetch request
   219  	if err != nil {
   220  		panic(err)
   221  	}
   222  
   223  	// Fetch the CID
   224  	stats, err := lassie.Fetch(ctx, request)
   225  	if err != nil {
   226  		panic(err)
   227  	}
   228  
   229  	// Print the stats
   230  	fmt.Printf("Fetched %d blocks in %d bytes\n", stats.Blocks, stats.Size)
   231  }
   232  
   233  ```
   234  
   235  Let's break down the above code.
   236  
   237  First, we create a default lassie instance:
   238  
   239  ```go
   240  ctx := context.Background()
   241  
   242  // Create a default lassie instance
   243  lassie, err := lassie.NewLassie(ctx)
   244  if err != nil {
   245  	panic(err)
   246  }
   247  ```
   248  
   249  The `NewLassie` function creates a new lassie instance with default settings, taking a `context.Context`. The context is used to control the lifecycle of the lassie instance. The function returns a `*Lassie` instance and an `error`. The `*Lassie` instance is used to make fetch requests. The `error` is used to indicate if there was an error creating the lassie instance.
   250  
   251  Additionally, the `NewLassie` function takes a variable number of `LassieOption`s. These options can be used to customize the lassie instance. For example, the `WithGlobalTimeout` option can be used to set a global timeout for all fetch requests made with the lassie instance. More information about the available options can be found in the [lassie.go](https://pkg.go.dev/github.com/filecoin-project/lassie/pkg/lassie) file.
   252  
   253  Next, we prepare the fetch request:
   254  
   255  ```go
   256  // Prepare the fetch
   257  rootCid := cid.MustParse("bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4")       // The CID to fetch
   258  store := storage.NewDeferredStorageCar(os.TempDir(), rootCid)                                 // The place to put the CAR file
   259  request, err := types.NewRequestForPath(store, rootCid, "", trustlessutils.DagScopeAll, nil)  // The fetch request
   260  if err != nil {
   261  	panic(err)
   262  }
   263  ```
   264  
   265  The `rootCid` is the CID we want to fetch. The `store` is where we want to write the car file. In this case we are choosing to store it in the OS's temp directory. The `request` is the resulting fetch request that we'll hand to the `lassie.Fetch` function.
   266  
   267  The `request` is created using the `NewRequestForPath` function. The only new information that this function takes that we haven't discussed is the `path` and the `dagScope`. The `path` is an optional path string to a file in the CID being requested. In this case we don't have a path, so pass an empty string. The `dagScope` has to do with traversal and describes the shape of the DAG fetched at the terminus of the specified path whose blocks are included in the returned CAR file after the blocks required to traverse path segments. More information on `dagScope` can be found in the [dag-scope HTTP Specification](./docs/HTTP_SPEC.md#dag-scope-request-query-parameter) section. In this case we use `trustlessutils.DagScopeAll` to specify we want everything from the root CID onward.
   268  
   269  The function returns a `*types.Request` and an `error`. The `*types.Request` is the resulting fetch request we'll pass to `lassie.Fetch`, and the `error` is used to indicate if there was an error creating the fetch request.
   270  
   271  Finally, we fetch the CID:
   272  
   273  ```go
   274  // Fetch the CID
   275  stats, err := lassie.Fetch(ctx, request)
   276  if err != nil {
   277  	panic(err)
   278  }
   279  ```
   280  
   281  The `Fetch` function takes a `context.Context`, a `*types.Request`, and a `*types.FetchOptions`. The `context.Context` is used to control the lifecycle of the fetch. The `*types.Request` is the fetch request we made above. The `*types.FetchOptions` is used to control the behavior of the fetch, but it's variadic, so we don't pass anything. The function returns a `*types.FetchStats` and an `error`. The `*types.FetchStats` is the fetch stats. The `error` is used to indicate if there was an error fetching the CID.
   282  
   283  ### Roots, pieces and payloads
   284  
   285  Lassie uses the term **Root** to refer to the head block of a potential graph (DAG) of IPLD blocks. This is typically the block you request, using its CID, when you perform a _fetch_ with Lassie. Of course a root could also be a sub-root of a larger graph, but when performing a retrieval with Lassie, you are focusing on the graph underneath the block you are fetching, and considerations of larger DAGs are not relevant.
   286  
   287  In the Filecoin ecosystem, there exists terminology related to "pieces" and "payloads" and there may be confusion between the way lassie uses the term "root CID" and some of the language used in Filecoin. A **Piece** is a Filecoin storage deal unit, typically containing user data organized into a CAR; then padded to size to form a portion of a Filecoin sector. Filecoin pieces have their own CIDs, and it is possible to retrieve a whole, raw piece, from Filecoin. This can lead to terminology such as "piece root CID". Lassie currently does not perform whole-piece retrievals, and is not intended to be able to handle piece CIDs. Additionally, in Filecoin the term **Payload** is sometimes used in reference to the IPLD data inside a piece when performing a storage or retrieval deal. This is closer to the way Lassie uses the term **Root** and historical Lassie code contains some references to "payloads" that are actually referring to the root CID of a graph.
   288  
   289  ## Contribute
   290  
   291  Early days PRs are welcome!
   292  
   293  ## License
   294  
   295  This library is dual-licensed under Apache 2.0 and MIT terms.
   296  
   297  Copyright 2022. Protocol Labs, Inc.