github.com/TrueBlocks/trueblocks-core/src/apps/chifra@v0.0.0-20241022031540-b362680128f7/internal/scrape/README.md (about) 1 ## chifra scrape 2 3 The `chifra scrape` application creates TrueBlocks' chunked index of address appearances -- the 4 fundamental data structure of the entire system. It also, optionally, pins each chunk of the index 5 to IPFS. 6 7 `chifra scrape` is a long running process, therefore we advise you run it as a service or in terminal 8 multiplexer such as `tmux`. You may start and stop `chifra scrape` as needed, but doing so means the 9 scraper will not be keeping up with the front of the blockchain. The next time it starts, it will 10 have to catch up to the chain, a process that may take several hours depending on how long ago it 11 was last run. See the section below and the "Papers" section of our website for more information 12 on how the scraping process works and prerequisites for its proper operation. 13 14 You may adjust the speed of the index creation with the `--sleep` and `--block_cnt` options. On 15 some machines, or when running against some EVM node software, the scraper may overburden the 16 hardware. Slowing things down will ensure proper operation. Finally, you may optionally `--pin` 17 each new chunk to IPFS which naturally shards the database among all users. By default, pinning 18 is against a locally running IPFS node, but the `--remote` option allows pinning to an IPFS 19 pinning service such as Pinata. 20 21 ```[plaintext] 22 Purpose: 23 Scan the chain and update the TrueBlocks index of appearances. 24 25 Usage: 26 chifra scrape [flags] 27 28 Flags: 29 -n, --block_cnt uint maximum number of blocks to process per pass (default 2000) 30 -s, --sleep float seconds to sleep between scraper passes (default 14) 31 -l, --touch uint first block to visit when scraping (snapped back to most recent snap_to_grid mark) 32 -u, --run_count uint run the scraper this many times, then quit 33 -d, --dry_run show the configuration that would be applied if run,no changes are made 34 -o, --notify enable the notify feature 35 -v, --verbose enable verbose output 36 -h, --help display this help screen 37 38 Notes: 39 - The --touch option may only be used for blocks after the latest scraped block (if any). It will be snapped back to the latest snap_to block. 40 - This command requires your RPC to provide trace data. See the README for more information. 41 - The --notify option requires proper configuration. Additionally, IPFS must be running locally. See the README.md file. 42 ``` 43 44 Data models produced by this tool: 45 46 - [chunkrecord](/data-model/admin/#chunkrecord) 47 - [manifest](/data-model/admin/#manifest) 48 - [message](/data-model/other/#message) 49 50 ### configuration 51 52 Each of the following additional configurable command line options are available. 53 54 **Configuration file:** `trueBlocks.toml` 55 **Configuration group:** `[scrape.<chain>]` 56 57 | Item | Type | Default | Description / Default | 58 | ------------ | ------ | ------- | ------------------------------------------------------------------------------------------------------------------------ | 59 | appsPerChunk | uint64 | 2000000 | the number of appearances to build into a chunk before consolidating it | 60 | snapToGrid | blknum | 250000 | an override to apps_per_chunk to snap-to-grid at every modulo of this value, this allows easier corrections to the index | 61 | firstSnap | blknum | 2000000 | the first block at which snap_to_grid is enabled | 62 | unripeDist | blknum | 28 | the distance (in blocks) from the front of the chain under which (inclusive) a block is considered unripe | 63 | channelCount | uint64 | 20 | number of concurrent processing channels | 64 | allowMissing | bool | false | do not report errors for blockchains that contain blocks with zero addresses | 65 66 Note that for Ethereum mainnet, the default values for appsPerChunk and firstSnap are 2,000,000 and 2,300,000 respectively. See the specification for a justification of these values. 67 68 These items may be set in three ways, each overriding the preceding method: 69 70 -- in the above configuration file under the `[scrape.<chain>]` group, 71 -- in the environment by exporting the configuration item as upper case (with underbars removed) and prepended with (TB underbar SCRAPE underbar CHAIN) with the underbars included, or 72 -- on the command line using the configuration item with leading dashes and in snake case (i.e., `--snake_case`). 73 74 ### further information 75 76 Each time `chifra scrape` runs, it begins at the last block it completed processing (plus one). With 77 each pass, the scraper descends into each block's complete data. (This is why TrueBlocks requires 78 a `--tracing` node.) As the scraper encounters appearances of address in the 79 block's data, it adds those appearances to a growing index. Periodically (after processing the 80 block that contains the 2,000,000th appearance), the system consolidates an **index chunk**. 81 82 An **index chunk** is a portion of the index containing approximately 2,000,000 records (although, 83 this number is adjustable for different chains). As part of the consolidation, the scraper creates 84 a Bloom filter representing the set membership in the associated index portion. The Bloom filters 85 are an order of magnitude smaller than the index chunks. The system then pushes both the index 86 chunk and the Bloom filter to IPFS. In this way, TrueBlocks creates an immutable, uncapturable 87 index of appearances that can be used not only by TrueBlocks, but any member of the community who 88 needs it. (Hint: We all need it.) 89 90 Users of of any of the TrueBlocks applications (or anyone else's applications) may subsequently 91 download the Bloom filters, query them to determine which **index chunks** need to be downloaded, 92 and thereby build a historical list of transactions for a given address. This is accomplished 93 while imposing a minimum amount of resource requirement on the end user's machine. 94 95 Recently, we enabled the ability for the end user to pin these downloaded index chunks and blooms 96 on their own machines. The user needs the data for the software to operate--sharing requires 97 minimal effort and makes the data available to other people. Everyone is better off. A 98 naturally-occuring network effect. 99 100 ### tracing 101 102 The `chifra scrape` command requires your node to provide the `trace_block` (and related) RPC endpoints. Please see the 103 README file for the `chifra traces` command for more information. 104 105 ### prerequisites 106 107 `chifra scrape` works with any EVM-based blockchain, but does not currently work without a "tracing, 108 archive" RPC endpoint. The Erigon and Reth blockchain nodes, given their minimal disc footprint for an 109 archive node and their support of the required `trace_` endpoint routines, are recommended. 110 111 Please [see this article](https://trueblocks.io/blog/a-long-winded-explanation-of-trueblocks/) for 112 more information about running the scraper and building and sharing the index of appearances. 113 114 ### notifications 115 116 The `chifra scrape` command provides a notification feature which is used primarily for `trueblocks-key`. 117 To configure it, you must edit the `trueBlocks.toml` file. You may edit the configuration file with 118 `chifra config edit`. Add the following configuration items to the `[settings]` group: 119 120 ```toml 121 [settings.notify] 122 url = "http://localhost:5555" # or other 123 author = "TrueBlocks" #optional 124 ``` 125 126 In addition, you must enable the feature by adding the `--notify` option to the command line. 127 128 ### Other Options 129 130 All tools accept the following additional flags, although in some cases, they have no meaning. 131 132 ```[plaintext] 133 -v, --version display the current version of the tool 134 --output string write the results to file 'fn' and return the filename 135 --append for --output command only append to instead of replace contents of file 136 --file string specify multiple sets of command line options in a file 137 ``` 138 139 **Note:** For the `--file string` option, you may place a series of valid command lines in a file using any 140 valid flags. In some cases, this may significantly improve performance. A semi-colon at the start 141 of any line makes it a comment. 142 143 **Note:** If you use `--output --append` option and at the same time the `--file` option, you may not switch 144 export formats in the command file. For example, a command file with two different commands, one with `--fmt csv` 145 and the other with `--fmt json` will produce both invalid CSV and invalid JSON. 146 147 *Copyright (c) 2024, TrueBlocks, LLC. All rights reserved. Generated with goMaker.*