github.com/stellar/stellar-etl@v1.0.1-0.20240312145900-4874b6bf2b89/README.md (about)

     1  
     2  # Stellar ETL
     3  The Stellar-ETL is a data pipeline that allows users to extract data from the history of the Stellar network.
     4  
     5  ## **Table of Contents**
     6  
     7  - [Exporting the Ledger Chain](#exporting-the-ledger-chain)
     8    - [Command Reference](#command-reference)
     9  	- [Bucket List Commands](#bucket-list-commands)
    10  	  - [export_accounts](#export_accounts)
    11  	  - [export_offers](#export_offers)
    12  	  - [export_trustlines](#export_trustlines)
    13  	  - [export_claimable_balances](#export_claimable_balances)
    14    	  - [export_pools](#export_pools)
    15    	  - [export_signers](#export_signers)
    16  	  - [export_contract_data (futurenet, testnet)](#export_contract_data)
    17  	  - [export_contract_code (futurenet, testnet)](#export_contract_code)
    18  	  - [export_config_settings (futurenet, testnet)](#export_config_settings)
    19  	  - [export_ttl (futurenet, testnet)](#export_ttl)
    20  	- [History Archive Commands](#history-archive-commands)
    21  	  - [export_ledgers](#export_ledgers)
    22  	  - [export_transactions](#export_transactions)
    23  	  - [export_operations](#export_operations)
    24  	  - [export_effects](#export_effects)
    25        - [export_assets](#export_assets)
    26        - [export_trades](#export_trades)
    27  	  - [export_diagnostic_events (futurenet, testnet)](#export_diagnostic_events)
    28  	- [Stellar Core Commands](#stellar-core-commands)
    29  	  - [export_ledger_entry_changes](#export_ledger_entry_changes)
    30        - [export_orderbooks (unsupported)](#export_orderbooks-unsupported)
    31  	  - [Utility Commands](#utility-commands)
    32  	  - [get_ledger_range_from_times](#get_ledger_range_from_times) 
    33  - [Schemas](#schemas)
    34  - [Extensions](#extensions)
    35    - [Adding New Commands](#adding-new-commands)
    36  <br>
    37  <br>
    38  
    39  
    40  # Exporting the Ledger Chain
    41  
    42  ## **Docker**
    43  1. Download the latest version of docker [Docker](https://www.docker.com/get-started)
    44  2. Pull the stellar-etl Docker image: `docker pull stellar/stellar-etl`
    45  3. Run the Docker images with the desired stellar-etl command: `docker run stellar/stellar-etl stellar-etl [etl-command] [etl-command arguments]`
    46  
    47  ## **Manual Installation**
    48  1. Install Golang v1.19.0 or later: https://golang.org/dl/
    49  
    50  2. Ensure that your Go bin has been added to the PATH env variable: `export PATH=$PATH:$(go env GOPATH)/bin`
    51  3. Download and install Stellar-Core v19.0.0 or later: https://github.com/stellar/stellar-core/blob/master/INSTALL.md
    52  
    53  4. Run `go get github.com/stellar/stellar-etl` to install the ETL
    54  
    55  5. Run export commands to export information about the legder
    56  
    57  ## **Command Reference**
    58  - [Bucket List Commands](#bucket-list-commands)
    59     - [export_accounts](#export_accounts)
    60     - [export_offers](#export_offers)
    61     - [export_trustlines](#export_trustlines)
    62     - [export_claimable_balances](#export_claimable_balances)
    63     - [export_pools](#export_pools)
    64     - [export_signers](#export_signers)
    65     - [export_contract_data](#export_contract_data)
    66     - [export_contract_code](#export_contract_code)
    67     - [export_config_settings](#export_config_settings)
    68     - [export_ttl](#export_ttl)
    69  - [History Archive Commands](#history-archive-commands)
    70     - [export_ledgers](#export_ledgers)
    71     - [export_transactions](#export_transactions)
    72     - [export_operations](#export_operations)
    73     - [export_effects](#export_effects)
    74     - [export_assets](#export_assets)
    75     - [export_trades](#export_trades)
    76     - [export_diagnostic_events](#export_diagnostic_events)
    77   - [Stellar Core Commands](#stellar-core-commands)
    78     - [export_orderbooks (unsupported)](#export_orderbooks-unsupported)
    79   - [Utility Commands](#utility-commands)
    80     - [get_ledger_range_from_times](#get_ledger_range_from_times)
    81  
    82  Every command accepts a `-h` parameter, which provides a help screen containing information about the command, its usage, and its flags.
    83  
    84  Commands have the option to read from testnet with the `--testnet` flag, from futurenet with the `--futurenet` flag, and defaults to reading from mainnet without any flags.
    85  > *_NOTE:_* Adding both flags will default to testnet. Each stellar-etl command can only run from one network at a time.
    86  
    87  <br>
    88  
    89  ***
    90  
    91  ## **Bucket List Commands**
    92  
    93  These commands use the bucket list in order to ingest large amounts of data from the history of the stellar ledger. If you are trying to read large amounts of information in order to catch up to the current state of the ledger, these commands provide a good way to catchup quickly. However, they don't allow for custom start-ledger values. For updating within a user-defined range, see the Stellar Core commands.
    94  
    95  > *_NOTE:_* In order to get information within a specified ledger range for bucket list commands, see the export_ledger_entry_changes command.
    96  
    97  <br>
    98  
    99  ### **export_accounts**
   100  
   101  ```bash
   102  > stellar-etl export_accounts --end-ledger 500000 --output exported_accounts.txt
   103  ```
   104  
   105  Exports historical account data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account information within a specified ledger range, see the export_ledger_entry_changes command.
   106  
   107  <br>
   108  
   109  ### **export_offers**
   110  
   111  ```bash
   112  > stellar-etl export_offers --end-ledger 500000 --output exported_offers.txt
   113  ```
   114  
   115  Exports historical offer data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get offer information within a specified ledger range, see the export_ledger_entry_changes command.
   116  
   117  <br>
   118  
   119  ### **export_trustlines**
   120  
   121  ```bash
   122  > stellar-etl export_trustlines --end-ledger 500000 --output exported_trustlines.txt
   123  ```
   124  
   125  Exports historical trustline data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get trustline information within a specified ledger range, see the export_ledger_entry_changes command.
   126  
   127  <br>
   128  
   129  ### **export_claimable_balances**
   130  
   131  ```bash
   132  > stellar-etl export_claimable_balances --end-ledger 500000 --output exported_claimable_balances.txt
   133  ```
   134  
   135  Exports claimable balances data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get claimable balances information within a specified ledger range, see the export_ledger_entry_changes command.
   136  
   137  <br>
   138  
   139  ### **export_pools**
   140  
   141  ```bash
   142  > stellar-etl export_pools --end-ledger 500000 --output exported_pools.txt
   143  ```
   144  
   145  Exports historical liquidity pools data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get liquidity pools information within a specified ledger range, see the export_ledger_entry_changes command.
   146  
   147  <br>
   148  
   149  ### **export_signers**
   150  
   151  ```bash
   152  > stellar-etl export_signers --end-ledger 500000 --output exported_signers.txt
   153  ```
   154  
   155  Exports historical account signers data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account signers information within a specified ledger range, see the export_ledger_entry_changes command.
   156  
   157  <br>
   158  
   159  ### **export_contract_data**
   160  
   161  ```bash
   162  > stellar-etl export_contract_data --end-ledger 500000 --output export_contract_data.txt
   163  ```
   164  
   165  Exports historical contract data data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract data information within a specified ledger range, see the export_ledger_entry_changes command.
   166  
   167  <br>
   168  
   169  ### **export_contract_code**
   170  
   171  ```bash
   172  > stellar-etl export_contract_code --end-ledger 500000 --output export_contract_code.txt
   173  ```
   174  
   175  Exports historical contract code data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract code information within a specified ledger range, see the export_ledger_entry_changes command.
   176  
   177  <br>
   178  
   179  ### **export_config_settings**
   180  
   181  ```bash
   182  > stellar-etl export_config_settings --end-ledger 500000 --output export_config_settings.txt
   183  ```
   184  
   185  Exports historical config settings data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get config settings data information within a specified ledger range, see the export_ledger_entry_changes command.
   186  
   187  <br>
   188  
   189  ### **export_ttl**
   190  
   191  ```bash
   192  > stellar-etl export_ttl --end-ledger 500000 --output export_ttl.txt
   193  ```
   194  
   195  Exports historical expiration data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get expiration information within a specified ledger range, see the export_ledger_entry_changes command.
   196  
   197  <br>
   198  
   199  ***
   200  
   201  ## **History Archive Commands**
   202  
   203  These commands export information using the history archives. This allows users to provide a start and end ledger range. The commands in this category export a list of everything that occurred within the provided range. All of the ranges are inclusive.
   204  
   205  > *_NOTE:_* Commands except `export_ledgers` and `export_assets` also require Captive Core to export data.
   206  
   207  <br>
   208  
   209  ### **export_ledgers**
   210  
   211  ```bash
   212  > stellar-etl export_ledgers --start-ledger 1000 \
   213  --end-ledger 500000 --output exported_ledgers.txt
   214  ```
   215  
   216  This command exports ledgers within the provided range. 
   217  
   218  <br>
   219  
   220  ### **export_transactions**
   221  
   222  ```bash
   223  > stellar-etl export_transactions --start-ledger 1000 \
   224  --end-ledger 500000 --output exported_transactions.txt
   225  ```
   226  
   227  This command exports transactions within the provided range.
   228  
   229  <br>
   230  
   231  ### **export_operations**
   232  
   233  ```bash
   234  > stellar-etl export_operations --start-ledger 1000 \
   235  --end-ledger 500000 --output exported_operations.txt
   236  ```
   237  
   238  This command exports operations within the provided range.
   239  
   240  <br>
   241  
   242  ### **export_effects**
   243  
   244  ```bash
   245  > stellar-etl export_effects --start-ledger 1000 \
   246  --end-ledger 500000 --output exported_effects.txt
   247  ```
   248  
   249  This command exports effects within the provided range.
   250  
   251  <br>
   252  
   253  ### **export_assets**
   254  ```bash
   255  > stellar-etl export_assets \
   256  --start-ledger 1000 \
   257  --end-ledger 500000 --output exported_assets.txt
   258  ```
   259  
   260  Exports the assets that are created from payment operations over a specified ledger range.
   261  
   262  <br>
   263  
   264  ### **export_trades**
   265  ```bash
   266  > stellar-etl export_trades \
   267  --start-ledger 1000 \
   268  --end-ledger 500000 --output exported_trades.txt
   269  ```
   270  
   271  Exports trade data within the specified range to an output file
   272  
   273  <br>
   274  
   275  ### **export_diagnostic_events**
   276  ```bash
   277  > stellar-etl export_diagnostic_events \
   278  --start-ledger 1000 \
   279  --end-ledger 500000 --output export_diagnostic_events.txt
   280  ```
   281  
   282  Exports diagnostic events data within the specified range to an output file
   283  
   284  <br>
   285  
   286  ***
   287  
   288  ## **Stellar Core Commands**
   289  
   290  These commands require a Stellar Core instance that is v19.0.0 or later. The commands use the Core instance to retrieve information about changes from the ledger. These changes can be in the form of accounts, offers, trustlines, claimable balances, liquidity pools, or account signers.
   291  
   292  As the Stellar network grows, the Stellar Core instance has to catch up on an increasingly large amount of information. This catch-up process can add some overhead to the commands in this category. In order to avoid this overhead, run prefer processing larger ranges instead of many small ones, or use unbounded mode.
   293  
   294  <br>
   295  
   296  ### **export_ledger_entry_changes**
   297  
   298  ```bash
   299  > stellar-etl export_ledger_entry_changes --start-ledger 1000 \
   300  --end-ledger 500000 --output exported_changes_folder/
   301  ```
   302  
   303  This command exports ledger changes within the provided ledger range. Flags can filter which ledger entry types are exported. If no data type flags are set, then by default all types are exported. If any are set, it is assumed that the others should not be exported.
   304  
   305  Changes are exported in batches of a size defined by the `batch-size` flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points for the nodes on the network, so it is beneficial to export in multiples of 64.
   306  
   307  This command has two modes: bounded and unbounded.
   308  
   309  #### **Bounded**
   310   If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.
   311   
   312  #### **Unbounded**
   313  If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new changes as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.
   314  
   315  <br>
   316  
   317  ### **export_orderbooks (unsupported)**
   318  
   319  ```bash
   320  > stellar-etl export_orderbooks --start-ledger 1000 \
   321  --end-ledger 500000 --output exported_orderbooks_folder/
   322  ```
   323  
   324  > *_NOTE:_* This is an expermental feature and is currently unsupported.
   325  
   326  This command exports orderbooks within the provided ledger range. Since exporting complete orderbooks at every single ledger would require an excessive amount of storage space, the output is normalized. Each batch that is exported contains multiple files, namely: `dimAccounts.txt`, `dimOffers.txt`, `dimMarkets.txt`, and `factEvents.txt`. The dim files relate a data structure to an ID. `dimMarkets`, for example, contains the buying and selling assets of a market, as well as the ID for that market. That ID is used in other places as a replacement for the full market information. This normalization process saves  a significant amount of space (roughly 90% in our benchmarks). The `factEvents` file connects ledger numbers to the offer IDs that were present at that ledger.
   327  
   328  Orderbooks are exported in batches of a size defined by the `batch-size` flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points in that once they are available, so are the previous 63 nodes. It is beneficial to export in multiples of 64.
   329  
   330  This command has two modes: bounded and unbounded.
   331  
   332  #### **Bounded**
   333   If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.
   334   
   335  #### **Unbounded**
   336  If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new orderbooks as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.
   337  
   338  <br>
   339  
   340  ***
   341  
   342  ## **Utility Commands**
   343  
   344  ### **get_ledger_range_from_times**
   345  ```bash
   346  > stellar-etl get_ledger_range_from_times \
   347  --start-time 2019-09-13T23:00:00+00:00 \
   348  --end-time 2019-09-14T13:35:10+00:00 --output exported_range.txt
   349  ```
   350  
   351  This command exports takes in a start and end time and converts it to a ledger range. The ledger range that is returned will be the smallest possible ledger range that completely covers the provided time period. 
   352  
   353  <br>
   354  <br>
   355  
   356  # Schemas
   357  
   358  See https://github.com/stellar/stellar-etl/blob/master/internal/transform/schema.go for the schemas of the data structures that are outputted by the ETL.
   359  
   360  <br>
   361  <br>
   362  
   363  # Extensions
   364  This section covers some possible extensions or further work that can be done.
   365  
   366  ## **Adding New Commands**
   367  In general, in order to add new commands, you need to add these files:
   368  
   369   - `export_new_data_structure.go` in the `cmd` folder
   370  	 - This file can be generated with cobra by calling: `cobra add {command}`
   371  	 - This file will parse flags, create output files, get the transformed data from the input package, and then export the data.
   372   - `export_new_data_structure_test.go` in the `cmd` folder
   373  	 - This file will contain some tests for the newly added command. The `runCLI` function does most of the heavy lifting. All the tests need is the command arguments to test and the desired output.
   374  	 - Test data should be stored in the `testdata/new_data_structure` folder
   375   - `new_data_structure.go` in the `internal/input` folder
   376  	 - This file will contain the methods needed to extract the new data structure from wherever it is located. This may be the history archives, the bucket list, or a captive core instance. 
   377  	 - This file should extract the data and transform it, and return the transformed data.
   378  	 - If working with captive core, the methods need to work in the background. There should be methods that export batches of data and send them to a channel. There should be other methods that read from the channel and transform the data so it can be exported.
   379  - `new_data_structure.go` in the `internal/transform` folder
   380  	- This file will contain the methods needed to transform the extracted data into a form that is suitable for BigQuery.
   381  	- The struct definition for the transformed object should be stored in `schemas.go` in the `internal/transform` folder.
   382  
   383  A good number of common methods are already written and stored in the `util` package.