github.com/crowdsecurity/crowdsec@v1.6.1/pkg/leakybucket/README.md (about)

     1  # Leakybuckets
     2  
     3  ## Bucket concepts
     4  
     5  The Leakybucket is used for decision making. Under certain conditions,
     6  enriched events are poured into these buckets. When these buckets are
     7  full, we raise a new event. After this event is raised the bucket is
     8  destroyed. There are many types of buckets, and we welcome any new
     9  useful design of buckets.
    10  
    11  Usually, the bucket configuration generates the creation of many
    12  buckets. They are differentiated by a field called stackkey. When two
    13  events arrive with the same stackkey they go in the same matching
    14  bucket.
    15  
    16  The very purpose of these buckets is to detect clients that exceed a
    17  certain rate of attempts to do something (ssh connection, http
    18  authentication failure, etc...). Thus, the most used stackkey field is
    19  often the source_ip.
    20  
    21  ## Standard leaky buckets
    22  
    23  Default buckets have two main configuration options:
    24  
    25   * capacity: number of events the bucket can hold. When the capacity
    26     is reached and a new event is poured, a new event is raised. We
    27     call this type of event overflow. This is an int.
    28  
    29   * leakspeed: duration needed for an event to leak. When an event
    30     leaks, it disappears from the bucket.
    31  
    32  ## Trigger
    33  
    34  A Trigger is a special type of bucket with a capacity of zero. Thus, when an
    35  event is poured into a trigger, it always raises an overflow.
    36  
    37  ## Uniq
    38  
    39  A Uniq is a bucket working like the standard leaky bucket except for one
    40  thing: a filter returns a property for each event and only one
    41  occurrence of this property is allowed in the bucket, thus the bucket
    42  is called uniq.
    43  
    44  ## Counter
    45  
    46  A Counter is a special type of bucket with an infinite capacity and an
    47  infinite leakspeed (it never overflows, nor leaks). Nevertheless,
    48  the event is raised after a fixed duration. The option is called
    49  duration.
    50  
    51  ## Bayesian
    52  
    53  A Bayesian is a special bucket that runs bayesian inference instead of 
    54  counting events. Each event must have its likelihoods specified in the
    55  yaml file under `prob_given_benign` and `prob_given_evil`. The bucket
    56  will continue evaluating events until the posterior goes above the 
    57  threshold (triggering the overflow) or the duration (specified by leakspeed)
    58  expires.
    59  
    60  ## Available configuration options for buckets
    61  
    62  ### Fields for standard buckets
    63  
    64  * type: mandatory field. Must be one of "leaky", "trigger", "uniq" or
    65    "counter"
    66  
    67  * name: mandatory field, but the value is totally open. Nevertheless,
    68    this value will tag the events raised by the bucket.
    69  
    70  * filter: mandatory field. It's a filter that is run to decide whether
    71    an event matches the bucket or not. The filter has to return
    72    a boolean. As a filter implementation we use
    73    https://github.com/antonmedv/expr
    74  
    75  * capacity: [mandatory for now, shouldn't be mandatory in the final
    76    version] it's the size of the bucket. When pouring in a bucket
    77    already with size events, it overflows.
    78  
    79  * leakspeed: leakspeed is a time duration (it has to be parsed by
    80    https://golang.org/pkg/time/#ParseDuration). After each interval, an
    81    event is leaked from the bucket.
    82  
    83  * stackkey: mandatory field. This field is used to differentiate on
    84    which instance of the bucket the matching events will be poured.
    85    When an unknown stackkey is seen in an event, a new bucket is created.
    86  
    87  * on_overflow: optional field, that tells what to do when the
    88    bucket is returning the overflow event. As of today, the possibilities
    89    are "ban,1h", "Reprocess" or "Delete".
    90    Reprocess is used to send the raised event back to the event pool to
    91    be matched against buckets
    92  
    93  ### Fields for special buckets
    94  
    95  #### Uniq
    96  
    97   * uniq_filter: an expression that must comply with the syntax defined
    98     in https://github.com/antonmedv/expr and must return a string.
    99     All strings returned by this filter in the same buckets have to be different.
   100     Thus if a string is seen twice, the event is dismissed.
   101  
   102  #### Trigger
   103  
   104  Capacity and leakspeed are not relevant for this kind of bucket.
   105  
   106  #### Counter
   107  
   108   * duration: the Counter will be destroyed after this interval
   109     has elapsed since its creation. The duration must be parsed
   110     by https://golang.org/pkg/time/#ParseDuration.
   111     Nevertheless, this kind of bucket is often used with an infinite
   112     leakspeed and an infinite capacity [capacity set to -1 for now].
   113  
   114  #### Bayesian
   115  
   116   * bayesian_prior: The prior to start with
   117   * bayesian_threshold: The threshold for the posterior to trigger the overflow.
   118   * bayesian_conditions: List of Bayesian conditions with likelihoods
   119  
   120  Bayesian Conditions are built from:
   121   * condition: The expr for this specific condition to be true
   122   * prob_given_evil: The likelihood an IP satisfies the condition given the fact
   123     that it is a maliscious IP
   124   * prob_given_benign: The likelihood an IP satisfies the condition given the fact
   125     that it is a benign IP
   126   * guillotine: Bool to stop the condition from getting evaluated if it has
   127     evaluated to true once. This should be used if evaluating the condition is 
   128     computationally expensive. 
   129  
   130  
   131  ## Add examples here
   132  
   133  ```
   134  # ssh bruteforce
   135  - type: leaky
   136    name: ssh_bruteforce
   137    filter: "Meta.log_type == 'ssh_failed-auth'"
   138    leakspeed: "10s"
   139    capacity: 5
   140    stackkey: "source_ip"
   141    on_overflow: ban,1h
   142  
   143  # reporting of src_ip,dest_port seen
   144  - type: counter
   145    name: counter
   146    filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
   147    distinct: "Meta.source_ip + ':' + Meta.dest_port"
   148    duration: 5m
   149    capacity: -1
   150  
   151  - type: trigger
   152    name: "New connection"
   153    filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
   154    on_overflow: Reprocess
   155  ```
   156  
   157  # Note on leakybuckets implementation
   158  
   159  [This is not dry enough to have many details here, but:]
   160  
   161  The bucket code is triggered by runPour in pour.go, by calling the `leaky.PourItemToHolders` function.
   162  There is one struct called buckets which is for now a
   163  `map[string]interface{}` that holds all buckets. The key of this map
   164  is derived from the filter configured for the bucket and its
   165  stackkey. This looks complicated, but it allows us to use
   166  only one struct. This is done in buckets.go.
   167  
   168  On top of that the implementation defines only the standard leaky
   169  bucket. A goroutine is launched for every bucket (`bucket.go`). This
   170  goroutine manages the life of the bucket.
   171  
   172  For special buckets, hooks are defined at initialization time in
   173  manager.go. Hooks are called when relevant by the bucket goroutine
   174  when events are poured and/or when a bucket overflows.