github.com/badrootd/celestia-core@v0.0.0-20240305091328-aa4207a4b25d/docs/core/running-in-production.md (about)

     1  ---
     2  order: 4
     3  ---
     4  
     5  # Running in production
     6  
     7  ## Database
     8  
     9  By default, CometBFT uses the `syndtr/goleveldb` package for its in-process
    10  key-value database. If you want maximal performance, it may be best to install
    11  the real C-implementation of LevelDB and compile CometBFT to use that using
    12  `make build COMETBFT_BUILD_OPTIONS=cleveldb`. See the [install
    13  instructions](../introduction/install.md) for details.
    14  
    15  CometBFT keeps multiple distinct databases in the `$CMTHOME/data`:
    16  
    17  - `blockstore.db`: Keeps the entire blockchain - stores blocks,
    18    block commits, and block meta data, each indexed by height. Used to sync new
    19    peers.
    20  - `evidence.db`: Stores all verified evidence of misbehaviour.
    21  - `state.db`: Stores the current blockchain state (ie. height, validators,
    22    consensus params). Only grows if consensus params or validators change. Also
    23    used to temporarily store intermediate results during block processing.
    24  - `tx_index.db`: Indexes txs (and their results) by tx hash and by DeliverTx result events.
    25  
    26  By default, CometBFT will only index txs by their hash and height, not by their DeliverTx
    27  result events. See [indexing transactions](../app-dev/indexing-transactions.md) for
    28  details.
    29  
    30  Applications can expose block pruning strategies to the node operator.
    31  Please read the documentation of your application to find out more details.
    32  
    33  Applications can use [state sync](./state-sync.md) to help nodes bootstrap quickly.
    34  
    35  ## Logging
    36  
    37  Default logging level (`log_level = "main:info,state:info,statesync:info,*:error"`) should suffice for
    38  normal operation mode. Read [this
    39  post](https://blog.cosmos.network/one-of-the-exciting-new-features-in-0-10-0-release-is-smart-log-level-flag-e2506b4ab756)
    40  for details on how to configure `log_level` config variable. Some of the
    41  modules can be found [here](./how-to-read-logs.md#list-of-modules). If
    42  you're trying to debug CometBFT or asked to provide logs with debug
    43  logging level, you can do so by running CometBFT with
    44  `--log_level="*:debug"`.
    45  
    46  ## Write Ahead Logs (WAL)
    47  
    48  CometBFT uses write ahead logs for the consensus (`cs.wal`) and the mempool
    49  (`mempool.wal`). Both WALs have a max size of 1GB and are automatically rotated.
    50  
    51  ### Consensus WAL
    52  
    53  The `consensus.wal` is used to ensure we can recover from a crash at any point
    54  in the consensus state machine.
    55  It writes all consensus messages (timeouts, proposals, block part, or vote)
    56  to a single file, flushing to disk before processing messages from its own
    57  validator. Since CometBFT validators are expected to never sign a conflicting vote, the
    58  WAL ensures we can always recover deterministically to the latest state of the consensus without
    59  using the network or re-signing any consensus messages.
    60  
    61  If your `consensus.wal` is corrupted, see [below](#wal-corruption).
    62  
    63  ### Mempool WAL
    64  
    65  The `mempool.wal` logs all incoming txs before running CheckTx, but is
    66  otherwise not used in any programmatic way. It's just a kind of manual
    67  safe guard. Note the mempool provides no durability guarantees - a tx sent to one or many nodes
    68  may never make it into the blockchain if those nodes crash before being able to
    69  propose it. Clients must monitor their txs by subscribing over websockets,
    70  polling for them, or using `/broadcast_tx_commit`. In the worst case, txs can be
    71  resent from the mempool WAL manually.
    72  
    73  For the above reasons, the `mempool.wal` is disabled by default. To enable, set
    74  `mempool.wal_dir` to where you want the WAL to be located (e.g.
    75  `data/mempool.wal`).
    76  
    77  ## DoS Exposure and Mitigation
    78  
    79  Validators are supposed to setup [Sentry Node Architecture](./validators.md)
    80  to prevent Denial-of-Service attacks.
    81  
    82  ### P2P
    83  
    84  The core of the CometBFT peer-to-peer system is `MConnection`. Each
    85  connection has `MaxPacketMsgPayloadSize`, which is the maximum packet
    86  size and bounded send & receive queues. One can impose restrictions on
    87  send & receive rate per connection (`SendRate`, `RecvRate`).
    88  
    89  The number of open P2P connections can become quite large, and hit the operating system's open
    90  file limit (since TCP connections are considered files on UNIX-based systems). Nodes should be
    91  given a sizable open file limit, e.g. 8192, via `ulimit -n 8192` or other deployment-specific
    92  mechanisms.
    93  
    94  ### RPC
    95  
    96  #### Attack Exposure and Mitigation
    97  
    98  **It is generally not recommended for RPC endpoints to be exposed publicly, and
    99  especially so if the node in question is a validator**, as the CometBFT RPC does
   100  not currently provide advanced security features. Public exposure of RPC
   101  endpoints without appropriate protection can make the associated node vulnerable
   102  to a variety of attacks.
   103  
   104  It is entirely up to operators to ensure, if nodes' RPC endpoints have to be
   105  exposed publicly, that appropriate measures have been taken to mitigate against
   106  attacks. Some examples of mitigation measures include, but are not limited to:
   107  
   108  - Never publicly exposing the RPC endpoints of validators (i.e. if the RPC
   109    endpoints absolutely have to be exposed, ensure you do so only on full nodes
   110    and with appropriate protection)
   111  - Correct usage of rate-limiting, authentication and caching (e.g. as provided
   112    by reverse proxies like [nginx](https://nginx.org/) and/or DDoS protection
   113    services like [Cloudflare](https://www.cloudflare.com))
   114  - Only exposing the specific endpoints absolutely necessary for the relevant use
   115    cases (configurable via nginx/Cloudflare/etc.)
   116  
   117  If no expertise is available to the operator to assist with securing nodes' RPC
   118  endpoints, it is strongly recommended to never expose those endpoints publicly.
   119  
   120  **Under no condition should any of the [unsafe RPC endpoints](../rpc/#/Unsafe)
   121  ever be exposed publicly.**
   122  
   123  #### Endpoints Returning Multiple Entries
   124  
   125  Endpoints returning multiple entries are limited by default to return 30
   126  elements (100 max). See the [RPC Documentation](../rpc/) for more information.
   127  
   128  ## Debugging CometBFT
   129  
   130  If you ever have to debug CometBFT, the first thing you should probably do is
   131  check out the logs. See [How to read logs](./how-to-read-logs.md), where we
   132  explain what certain log statements mean.
   133  
   134  If, after skimming through the logs, things are not clear still, the next thing
   135  to try is querying the `/status` RPC endpoint. It provides the necessary info:
   136  whenever the node is syncing or not, what height it is on, etc.
   137  
   138  ```bash
   139  curl http(s)://{ip}:{rpcPort}/status
   140  ```
   141  
   142  `/dump_consensus_state` will give you a detailed overview of the consensus
   143  state (proposer, latest validators, peers states). From it, you should be able
   144  to figure out why, for example, the network had halted.
   145  
   146  ```bash
   147  curl http(s)://{ip}:{rpcPort}/dump_consensus_state
   148  ```
   149  
   150  There is a reduced version of this endpoint - `/consensus_state`, which returns
   151  just the votes seen at the current height.
   152  
   153  If, after consulting with the logs and above endpoints, you still have no idea
   154  what's happening, consider using `cometbft debug kill` sub-command. This
   155  command will scrap all the available info and kill the process. See
   156  [Debugging](../tools/debugging.md) for the exact format.
   157  
   158  You can inspect the resulting archive yourself or create an issue on
   159  [Github](https://github.com/cometbft/cometbft). Before opening an issue
   160  however, be sure to check if there's [no existing
   161  issue](https://github.com/cometbft/cometbft/issues) already.
   162  
   163  ## Monitoring CometBFT
   164  
   165  Each CometBFT instance has a standard `/health` RPC endpoint, which responds
   166  with 200 (OK) if everything is fine and 500 (or no response) - if something is
   167  wrong.
   168  
   169  Other useful endpoints include mentioned earlier `/status`, `/net_info` and
   170  `/validators`.
   171  
   172  CometBFT also can report and serve Prometheus and Pyroscope metrics. See
   173  [Metrics](./metrics.md).
   174  
   175  `cometbft debug dump` sub-command can be used to periodically dump useful
   176  information into an archive. See [Debugging](../tools/debugging.md) for more
   177  information.
   178  
   179  ## What happens when my app dies
   180  
   181  You are supposed to run CometBFT under a [process
   182  supervisor](https://en.wikipedia.org/wiki/Process_supervision) (like
   183  systemd or runit). It will ensure CometBFT is always running (despite
   184  possible errors).
   185  
   186  Getting back to the original question, if your application dies,
   187  CometBFT will panic. After a process supervisor restarts your
   188  application, CometBFT should be able to reconnect successfully. The
   189  order of restart does not matter for it.
   190  
   191  ## Signal handling
   192  
   193  We catch SIGINT and SIGTERM and try to clean up nicely. For other
   194  signals we use the default behavior in Go:
   195  [Default behavior of signals in Go programs](https://golang.org/pkg/os/signal/#hdr-Default_behavior_of_signals_in_Go_programs).
   196  
   197  ## Corruption
   198  
   199  **NOTE:** Make sure you have a backup of the CometBFT data directory.
   200  
   201  ### Possible causes
   202  
   203  Remember that most corruption is caused by hardware issues:
   204  
   205  - RAID controllers with faulty / worn out battery backup, and an unexpected power loss
   206  - Hard disk drives with write-back cache enabled, and an unexpected power loss
   207  - Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
   208  - Defective RAM
   209  - Defective or overheating CPU(s)
   210  
   211  Other causes can be:
   212  
   213  - Database systems configured with fsync=off and an OS crash or power loss
   214  - Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
   215  - CometBFT bugs
   216  - Operating system bugs
   217  - Admin error (e.g., directly modifying CometBFT data-directory contents)
   218  
   219  (Source: <https://wiki.postgresql.org/wiki/Corruption>)
   220  
   221  ### WAL Corruption
   222  
   223  If consensus WAL is corrupted at the latest height and you are trying to start
   224  CometBFT, replay will fail with panic.
   225  
   226  Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
   227  
   228  1. Delete the WAL file and restart CometBFT. It will attempt to sync with other peers.
   229  2. Try to repair the WAL file manually:
   230  
   231  1) Create a backup of the corrupted WAL file:
   232  
   233      ```sh
   234      cp "$CMTHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
   235      ```
   236  
   237  2) Use `./scripts/wal2json` to create a human-readable version:
   238  
   239      ```sh
   240      ./scripts/wal2json/wal2json "$CMTHOME/data/cs.wal/wal" > /tmp/corrupted_wal
   241      ```
   242  
   243  3)  Search for a "CORRUPTED MESSAGE" line.
   244  4)  By looking at the previous message and the message after the corrupted one
   245     and looking at the logs, try to rebuild the message. If the consequent
   246     messages are marked as corrupted too (this may happen if length header
   247     got corrupted or some writes did not make it to the WAL ~ truncation),
   248     then remove all the lines starting from the corrupted one and restart
   249     CometBFT.
   250  
   251      ```sh
   252      $EDITOR /tmp/corrupted_wal
   253      ```
   254  
   255  5)  After editing, convert this file back into binary form by running:
   256  
   257      ```sh
   258      ./scripts/json2wal/json2wal /tmp/corrupted_wal  $CMTHOME/data/cs.wal/wal
   259      ```
   260  
   261  ## Hardware
   262  
   263  ### Processor and Memory
   264  
   265  While actual specs vary depending on the load and validators count, minimal
   266  requirements are:
   267  
   268  - 1GB RAM
   269  - 25GB of disk space
   270  - 1.4 GHz CPU
   271  
   272  SSD disks are preferable for applications with high transaction throughput.
   273  
   274  Recommended:
   275  
   276  - 2GB RAM
   277  - 100GB SSD
   278  - x64 2.0 GHz 2v CPU
   279  
   280  While for now, CometBFT stores all the history and it may require significant
   281  disk space over time, we are planning to implement state syncing (See [this
   282  issue](https://github.com/cometbft/cometbft/issues/828)). So, storing all
   283  the past blocks will not be necessary.
   284  
   285  ### Validator signing on 32 bit architectures (or ARM)
   286  
   287  Both our `ed25519` and `secp256k1` implementations require constant time
   288  `uint64` multiplication. Non-constant time crypto can (and has) leaked
   289  private keys on both `ed25519` and `secp256k1`. This doesn't exist in hardware
   290  on 32 bit x86 platforms ([source](https://bearssl.org/ctmul.html)), and it
   291  depends on the compiler to enforce that it is constant time. It's unclear at
   292  this point whenever the Golang compiler does this correctly for all
   293  implementations.
   294  
   295  **We do not support nor recommend running a validator on 32 bit architectures OR
   296  the "VIA Nano 2000 Series", and the architectures in the ARM section rated
   297  "S-".**
   298  
   299  ### Operating Systems
   300  
   301  CometBFT can be compiled for a wide range of operating systems thanks to Go
   302  language (the list of \$OS/\$ARCH pairs can be found
   303  [here](https://golang.org/doc/install/source#environment)).
   304  
   305  While we do not favor any operation system, more secure and stable Linux server
   306  distributions (like CentOS) should be preferred over desktop operation systems
   307  (like Mac OS).
   308  
   309  ### Miscellaneous
   310  
   311  NOTE: if you are going to use CometBFT in a public domain, make sure
   312  you read [hardware recommendations](https://cosmos.network/validators) for a validator in the
   313  Cosmos network.
   314  
   315  ## Configuration parameters
   316  
   317  - `p2p.flush_throttle_timeout`
   318  - `p2p.max_packet_msg_payload_size`
   319  - `p2p.send_rate`
   320  - `p2p.recv_rate`
   321  
   322  If you are going to use CometBFT in a private domain and you have a
   323  private high-speed network among your peers, it makes sense to lower
   324  flush throttle timeout and increase other params.
   325  
   326  ```toml
   327  [p2p]
   328  
   329  send_rate=20000000 # 2MB/s
   330  recv_rate=20000000 # 2MB/s
   331  flush_throttle_timeout=10
   332  max_packet_msg_payload_size=10240 # 10KB
   333  ```
   334  
   335  - `mempool.recheck`
   336  
   337  After every block, CometBFT rechecks every transaction left in the
   338  mempool to see if transactions committed in that block affected the
   339  application state, so some of the transactions left may become invalid.
   340  If that does not apply to your application, you can disable it by
   341  setting `mempool.recheck=false`.
   342  
   343  - `mempool.broadcast`
   344  
   345  Setting this to false will stop the mempool from relaying transactions
   346  to other peers until they are included in a block. It means only the
   347  peer you send the tx to will see it until it is included in a block.
   348  
   349  - `consensus.skip_timeout_commit`
   350  
   351  We want `skip_timeout_commit=false` when there is economics on the line
   352  because proposers should wait to hear for more votes. But if you don't
   353  care about that and want the fastest consensus, you can skip it. It will
   354  be kept false by default for public deployments (e.g. [Cosmos
   355  Hub](https://cosmos.network/intro/hub)) while for enterprise
   356  applications, setting it to true is not a problem.
   357  
   358  - `consensus.peer_gossip_sleep_duration`
   359  
   360  You can try to reduce the time your node sleeps before checking if
   361  there's something to send its peers.
   362  
   363  - `consensus.timeout_commit`
   364  
   365  You can also try lowering `timeout_commit` (time we sleep before
   366  proposing the next block).
   367  
   368  - `p2p.addr_book_strict`
   369  
   370  By default, CometBFT checks whenever a peer's address is routable before
   371  saving it to the address book. The address is considered as routable if the IP
   372  is [valid and within allowed ranges](https://github.com/cometbft/cometbft/blob/v0.34.x/p2p/netaddress.go#L258).
   373  
   374  This may not be the case for private or local networks, where your IP range is usually
   375  strictly limited and private. If that case, you need to set `addr_book_strict`
   376  to `false` (turn it off).
   377  
   378  - `rpc.max_open_connections`
   379  
   380  By default, the number of simultaneous connections is limited because most OS
   381  give you limited number of file descriptors.
   382  
   383  If you want to accept greater number of connections, you will need to increase
   384  these limits.
   385  
   386  [Sysctls to tune the system to be able to open more connections](https://github.com/satori-com/tcpkali/blob/master/doc/tcpkali.man.md#sysctls-to-tune-the-system-to-be-able-to-open-more-connections)
   387  
   388  The process file limits must also be increased, e.g. via `ulimit -n 8192`.
   389  
   390  ...for N connections, such as 50k:
   391  
   392  ```md
   393  kern.maxfiles=10000+2*N         # BSD
   394  kern.maxfilesperproc=100+2*N    # BSD
   395  kern.ipc.maxsockets=10000+2*N   # BSD
   396  fs.file-max=10000+2*N           # Linux
   397  net.ipv4.tcp_max_orphans=N      # Linux
   398  
   399  # For load-generating clients.
   400  net.ipv4.ip_local_port_range="10000  65535"  # Linux.
   401  net.inet.ip.portrange.first=10000  # BSD/Mac.
   402  net.inet.ip.portrange.last=65535   # (Enough for N < 55535)
   403  net.ipv4.tcp_tw_reuse=1         # Linux
   404  net.inet.tcp.maxtcptw=2*N       # BSD
   405  
   406  # If using netfilter on Linux:
   407  net.netfilter.nf_conntrack_max=N
   408  echo $((N/8)) > /sys/module/nf_conntrack/parameters/hashsize
   409  ```
   410  
   411  The similar option exists for limiting the number of gRPC connections -
   412  `rpc.grpc_max_open_connections`.