github.com/netdata/go.d.plugin@v0.58.1/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md

github.com/netdata/go.d.plugin@v0.58.1/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md (about)

     1  <!--startmeta
     2  custom_edit_url: "https://github.com/netdata/go.d.plugin/edit/master/modules/hdfs/README.md"
     3  meta_yaml: "https://github.com/netdata/go.d.plugin/edit/master/modules/hdfs/metadata.yaml"
     4  sidebar_label: "Hadoop Distributed File System (HDFS)"
     5  learn_status: "Published"
     6  learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems"
     7  most_popular: True
     8  message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
     9  endmeta-->
    10  
    11  # Hadoop Distributed File System (HDFS)
    12  
    13  
    14  <img src="https://netdata.cloud/img/hadoop.svg" width="150"/>
    15  
    16  
    17  Plugin: go.d.plugin
    18  Module: hfs
    19  
    20  <img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />
    21  
    22  ## Overview
    23  
    24  This collector monitors HDFS nodes.
    25  
    26  Netdata accesses HDFS metrics over `Java Management Extensions` (JMX) through the web interface of an HDFS daemon.
    27  
    28  
    29  
    30  
    31  This collector is supported on all platforms.
    32  
    33  This collector supports collecting metrics from multiple instances of this integration, including remote instances.
    34  
    35  
    36  ### Default Behavior
    37  
    38  #### Auto-Detection
    39  
    40  This integration doesn't support auto-detection.
    41  
    42  #### Limits
    43  
    44  The default configuration for this integration does not impose any limits on data collection.
    45  
    46  #### Performance Impact
    47  
    48  The default configuration for this integration is not expected to impose a significant performance impact on the system.
    49  
    50  
    51  ## Metrics
    52  
    53  Metrics grouped by *scope*.
    54  
    55  The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
    56  
    57  
    58  
    59  ### Per Hadoop Distributed File System (HDFS) instance
    60  
    61  These metrics refer to the entire monitored application.
    62  
    63  This scope has no labels.
    64  
    65  Metrics:
    66  
    67  | Metric | Dimensions | Unit | DataNode | NameNode |
    68  |:------|:----------|:----|:---:|:---:|
    69  | hdfs.heap_memory | committed, used | MiB | • | • |
    70  | hdfs.gc_count_total | gc | events/s | • | • |
    71  | hdfs.gc_time_total | ms | ms | • | • |
    72  | hdfs.gc_threshold | info, warn | events/s | • | • |
    73  | hdfs.threads | new, runnable, blocked, waiting, timed_waiting, terminated | num | • | • |
    74  | hdfs.logs_total | info, error, warn, fatal | logs/s | • | • |
    75  | hdfs.rpc_bandwidth | received, sent | kilobits/s | • | • |
    76  | hdfs.rpc_calls | calls | calls/s | • | • |
    77  | hdfs.open_connections | open | connections | • | • |
    78  | hdfs.call_queue_length | length | num | • | • |
    79  | hdfs.avg_queue_time | time | ms | • | • |
    80  | hdfs.avg_processing_time | time | ms | • | • |
    81  | hdfs.capacity | remaining, used | KiB |   | • |
    82  | hdfs.used_capacity | dfs, non_dfs | KiB |   | • |
    83  | hdfs.load | load | load |   | • |
    84  | hdfs.volume_failures_total | failures | events/s |   | • |
    85  | hdfs.files_total | files | num |   | • |
    86  | hdfs.blocks_total | blocks | num |   | • |
    87  | hdfs.blocks | corrupt, missing, under_replicated | num |   | • |
    88  | hdfs.data_nodes | live, dead, stale | num |   | • |
    89  | hdfs.datanode_capacity | remaining, used | KiB | • |   |
    90  | hdfs.datanode_used_capacity | dfs, non_dfs | KiB | • |   |
    91  | hdfs.datanode_failed_volumes | failed volumes | num | • |   |
    92  | hdfs.datanode_bandwidth | reads, writes | KiB/s | • |   |
    93  
    94  
    95  
    96  ## Alerts
    97  
    98  
    99  The following alerts are available:
   100  
   101  | Alert name  | On metric | Description |
   102  |:------------|:----------|:------------|
   103  | [ hdfs_capacity_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/hdfs.conf) | hdfs.capacity | summary datanodes space capacity utilization |
   104  | [ hdfs_missing_blocks ](https://github.com/netdata/netdata/blob/master/health/health.d/hdfs.conf) | hdfs.blocks | number of missing blocks |
   105  | [ hdfs_stale_nodes ](https://github.com/netdata/netdata/blob/master/health/health.d/hdfs.conf) | hdfs.data_nodes | number of datanodes marked stale due to delayed heartbeat |
   106  | [ hdfs_dead_nodes ](https://github.com/netdata/netdata/blob/master/health/health.d/hdfs.conf) | hdfs.data_nodes | number of datanodes which are currently dead |
   107  | [ hdfs_num_failed_volumes ](https://github.com/netdata/netdata/blob/master/health/health.d/hdfs.conf) | hdfs.num_failed_volumes | number of failed volumes |
   108  
   109  
   110  ## Setup
   111  
   112  ### Prerequisites
   113  
   114  No action required.
   115  
   116  ### Configuration
   117  
   118  #### File
   119  
   120  The configuration file name for this integration is `go.d/hdfs.conf`.
   121  
   122  
   123  You can edit the configuration file using the `edit-config` script from the
   124  Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory).
   125  
   126  ```bash
   127  cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
   128  sudo ./edit-config go.d/hdfs.conf
   129  ```
   130  #### Options
   131  
   132  The following options can be defined globally: update_every, autodetection_retry.
   133  
   134  
   135  <details><summary>Config options</summary>
   136  
   137  | Name | Description | Default | Required |
   138  |:----|:-----------|:-------|:--------:|
   139  | update_every | Data collection frequency. | 1 | no |
   140  | autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
   141  | url | Server URL. | http://127.0.0.1:9870/jmx | yes |
   142  | timeout | HTTP request timeout. | 1 | no |
   143  | username | Username for basic HTTP authentication. |  | no |
   144  | password | Password for basic HTTP authentication. |  | no |
   145  | proxy_url | Proxy URL. |  | no |
   146  | proxy_username | Username for proxy basic HTTP authentication. |  | no |
   147  | proxy_password | Password for proxy basic HTTP authentication. |  | no |
   148  | method | HTTP request method. | GET | no |
   149  | body | HTTP request body. |  | no |
   150  | headers | HTTP request headers. |  | no |
   151  | not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no |
   152  | tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no |
   153  | tls_ca | Certification authority that the client uses when verifying the server's certificates. |  | no |
   154  | tls_cert | Client TLS certificate. |  | no |
   155  | tls_key | Client TLS key. |  | no |
   156  
   157  </details>
   158  
   159  #### Examples
   160  
   161  ##### Basic
   162  
   163  A basic example configuration.
   164  
   165  ```yaml
   166  jobs:
   167    - name: local
   168      url: http://127.0.0.1:9870/jmx
   169  
   170  ```
   171  ##### HTTP authentication
   172  
   173  Basic HTTP authentication.
   174  
   175  <details><summary>Config</summary>
   176  
   177  ```yaml
   178  jobs:
   179    - name: local
   180      url: http://127.0.0.1:9870/jmx
   181      username: username
   182      password: password
   183  
   184  ```
   185  </details>
   186  
   187  ##### HTTPS with self-signed certificate
   188  
   189  Do not validate server certificate chain and hostname.
   190  
   191  
   192  <details><summary>Config</summary>
   193  
   194  ```yaml
   195  jobs:
   196    - name: local
   197      url: https://127.0.0.1:9870/jmx
   198      tls_skip_verify: yes
   199  
   200  ```
   201  </details>
   202  
   203  ##### Multi-instance
   204  
   205  > **Note**: When you define multiple jobs, their names must be unique.
   206  
   207  Collecting metrics from local and remote instances.
   208  
   209  
   210  <details><summary>Config</summary>
   211  
   212  ```yaml
   213  jobs:
   214    - name: local
   215      url: http://127.0.0.1:9870/jmx
   216  
   217    - name: remote
   218      url: http://192.0.2.1:9870/jmx
   219  
   220  ```
   221  </details>
   222  
   223  
   224  
   225  ## Troubleshooting
   226  
   227  ### Debug Mode
   228  
   229  To troubleshoot issues with the `hfs` collector, run the `go.d.plugin` with the debug option enabled. The output
   230  should give you clues as to why the collector isn't working.
   231  
   232  - Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
   233    your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.
   234  
   235    ```bash
   236    cd /usr/libexec/netdata/plugins.d/
   237    ```
   238  
   239  - Switch to the `netdata` user.
   240  
   241    ```bash
   242    sudo -u netdata -s
   243    ```
   244  
   245  - Run the `go.d.plugin` to debug the collector:
   246  
   247    ```bash
   248    ./go.d.plugin -d -m hfs
   249    ```
   250  
   251