github.com/grafana/pyroscope@v1.18.0/docs/sources/configure-client/grafana-alloy/sampling.md (about)

     1  ---
     2  title: "Sampling scrape targets"
     3  menuTitle: "Sampling targets"
     4  description: "Sampling scrape targets with Grafana Alloy"
     5  weight: 30
     6  ---
     7  
     8  # Sampling scrape targets
     9  
    10  Applications often have many instances deployed.
    11  While Pyroscope is designed to handle large amounts of profiling data, you may want only a subset of the application's instances to be scraped.
    12  
    13  For example, the volume of profiling data your application generates may make it unreasonable to profile every instance, or you might be targeting cost-reduction.
    14  
    15  Through configuration of Grafana Alloy collector, Pyroscope can sample scrape targets.
    16  
    17  ## Before you begin
    18  
    19  Make sure you understand how to configure the collector to scrape targets and are familiar with the component configuration language.
    20  Alloy configuration files use the Alloy [configuration syntax](https://grafana.com/docs/alloy/latest/concepts/configuration-syntax/).
    21  
    22  ## Configuration
    23  
    24  The `hashmod` action and the `modulus` argument are used in conjunction to enable sampling behavior by sharding one or more labels.
    25  To read further on these concepts, refer to [rule block documentation](https://grafana.com/docs/alloy/<ALLOY_VERSION>/reference/components/discovery/discovery.relabel/#rule-block).
    26  In short, `hashmod` performs an MD5 hash on the source labels and `modulus` performs a modulus operation on the output.
    27  
    28  The sample size can be modified by changing the value of `modulus` in the `hashmod` action and the `regex` argument in the `keep` action.
    29  The `modulus` value defines the number of shards, while the `regex` value selects a subset of the shards.
    30  
    31  ![Workflow for sampling scrape targets](../sample.svg)
    32  
    33  {{< admonition type="note" >}}
    34  Choose your source label(s) for the `hashmod` action carefully. They must uniquely define each scrape target or `hashmod` won't be able to shard the targets uniformly.
    35  {{< /admonition >}}
    36  
    37  For example, consider an application deployed on Kubernetes with 100 pod replicas, all uniquely identified by the label `pod_hash`.
    38  The following configuration is set to sample 15% of the pods:
    39  
    40  ```alloy
    41  discovery.kubernetes "profile_pods" {
    42    role = "pod"
    43  }
    44  
    45  discovery.relabel "profile_pods" {
    46    targets = concat(discovery.kubernetes.profile_pods.targets)
    47  
    48    // Other rule blocks ...
    49  
    50    rule {
    51      action        = "hashmod"
    52      source_labels = ["pod_hash"]
    53      modulus       = 100
    54      target_label  = "__tmp_hashmod"
    55    }
    56  
    57    rule {
    58      action        = "keep"
    59      source_labels = ["__tmp_hashmod"]
    60      regex         = "^([0-9]|1[0-4])$"
    61    }
    62  
    63    // Other rule blocks ...
    64  }
    65  ```
    66  
    67  ## Considerations
    68  
    69  This strategy doesn't guarantee precise sampling.
    70  Due to its reliance on an MD5 hash, there isn't a perfectly uniform distribution of scrape targets into shards.
    71  Larger numbers of scrape targets yield increasingly accurate sampling.
    72  
    73  Keep in mind, if the label hashed is deterministic, you see deterministic sharding and thereby deterministic sampling of scrape targets.
    74  Similarly, if the label hashed is non-deterministic, you see scrape targets sampled in a non-deterministic fashion.