github.com/jonaz/heapster@v1.3.0-beta.0.0.20170208112634-cd3c15ca3d29/docs/proposals/old-timer.md (about) 1 # Heapster Oldtimer 2 3 ## Overview 4 5 Prior to the Heapster refactor, the Heapster model presented aggregations of 6 metrics over certain time periods (the last hour and day). Post-refactor, the 7 concern of presenting an interface for historical metrics was to be split into 8 a separate Heapster component: Oldtimer. 9 10 Oldtimer will run as part of the main Heapster executable, and will present 11 common interfaces for retrieving historical metrics over longer periods of time 12 than the Heapster model, and will allow fetching aggregations of metrics (e.g. 13 averages, 95 percentile, etc) over different periods of time. It will do this 14 by querying the sink to which it is storing metrics. 15 16 Note: even though we are retrieving metrics, this document refers to the 17 metrics storage locations as "sinks" to be consistent with the rest 18 of Heapster. 19 20 ## Motivation 21 22 There are two major motivations for exposing historical metrics information: 23 24 1. Using aggregated historical data to make size-related decisions 25 (for example, idling requires looking for traffic over a long time period) 26 27 2. Providing a common interface for users to view historical metrics 28 29 Before the Heapster refactoring (see the 30 [Heapster Long Term Vision Proposal](https://github.com/kubernetes/heapster/blob/master/docs/proposals/vision.md)), 31 Heapster supported querying metrics aggregated over certain extended time 32 periods (the last hour and day) via the Heapster model. 33 34 However, since the Heapster model is stored in-memory, and not persisted to 35 disk, this historical data would be "lost" whenever Heapster was restarted. 36 This made it unreliable for use by system components which need a historical 37 view. 38 39 Since we already persist metrics into a sink, it does not make sense for 40 Heapster itself to persist long-term metrics to disk itself. Instead, we can 41 just query the sink directly. 42 43 ## API 44 45 Oldtimer will present an api somewhat similar to the normal Heapster model. 46 The structure of the URLs is designed to mirror those exposed by the model API. 47 When used simply to retrieve historical data points, Oldtimer will return the 48 same types as the model API. When the used to retrieve aggregations, Oldtimer 49 will return special data types detailed under the "Return Types" section. 50 51 ### Paths 52 53 `/api/v1/historical/{prefix}/metrics/`: Returns a list of all available 54 metrics. 55 56 `/api/v1/historical{prefix}/metrics/{metric-name}?start=X&end=Y`: Returns a set 57 of (Timestamp, Value) pairs for the requested {prefix}-level metric, over the 58 given time range. 59 60 `/api/v1/historical{prefix}/metrics-aggregated/{aggregations}/{metric-name}?start=X&end=Y&bucket=B` 61 Returns the requested {prefix}-level metric, aggregated with the given 62 aggregation over the requested time period (potentially split into several 63 different bucket of duration `B`). `{aggregations}` may be a comma-separated 64 list of aggregations to retrieve multiple at once. 65 66 Where `{prefix}` is normally either empty (cluster-level), 67 `/namespaces/{namespace}` (namespace-level), 68 `/namespaces/{namespace}/pods/{pod-name}` (pod-level), 69 `/namespaces/{namespace}/pod-list/{pod-list}` (multi-pod-level), or 70 `/namespaces/{namespace}/pods/{pod-name}/containers/{container-name}` 71 (container-level). 72 73 Additionally, since pod names are not temporally unique (i.e. it is possible to 74 delete a pod, and then create a new, completely different pod with the same 75 name), `{prefix}` may also be `/pod-id/{pod-id}` (pod-level metrics), 76 `/pod-id-list/{pod-id-list}` (multi-pod-level), or 77 `/pod-id/{pod-id}/containers/{container-name}` (container-level metrics). 78 79 In addition, when `{prefix}` is not empty, there will be a url of the form: 80 `/api/v1/historical/{prefix-without-final-element}` which allows fetching the 81 list of available nodes/namespaces/pods/containers. 82 83 Note that queries by pod name will return metrics from the latest pod with the 84 given name. This may require an extra trip to the database in some cases, in 85 order to determine which pod id that actually is. For this reason, if a 86 component knows the pod ids for which it is querying, using these is preferred 87 to using the pod names. The pod-name-based API is retained for the sake of 88 easy queries and to match up with the model API. 89 90 ### Parameter Types 91 92 The `start` and `end` parameters are defined the same way as for the model: 93 each should be a timestamp formatted according to RFC 3339, if no start time is 94 specified, it defaults to zero in Unix epoch time, and if no end time is 95 specified, all data after the start time will be considered. 96 97 The `bucket` (bucket duration) parameter is a number followed by any of the 98 following suffixes: 99 100 - `ms`: milliseconds 101 - `s`: seconds 102 - `m`: minutes 103 - `h`: hours 104 - `d`: days 105 106 ### Return Types 107 108 For requests which simply fetch data points or list available objects, the 109 return format will be the same as that used in the Heapster model API. 110 111 The case of aggregations, a different set of types is used: each bucket is 112 represented by a `MetricAggregationBucket`, which contains the timestamp for 113 that bucket (the start of the bucket), the count of entries in that bucket (if 114 requested) as an unsigned integer, as well as each of the other requested 115 aggregations, in the form of a `MetricValue` (which just holds an unsigned int 116 or a float). 117 118 All buckets for a particular metric are grouped together in a 119 `MetricAggregationResult`, which also holds the bucket size (duration) for the 120 buckets. If multiple pods are requested, the result will be returned as a 121 `MetricAggregationResultList`, similarly to the `MetricResultList` for the 122 model API. 123 124 ```go 125 type MetricValue struct { 126 IntValue *uint64 127 FloatValue *float64 128 } 129 130 type MetricAggregationBucket struct { 131 Timestamp time.Time 132 Count *uint64 133 134 Average *MetricValue 135 Maximum *MetricValue 136 Minimum *MetricValue 137 Median *MetricValue 138 Percentiles map[uint64]MetricValue 139 } 140 141 type MetricAggregationResult struct { 142 Buckets []MetricAggregationBucket 143 BucketSize time.Duration 144 } 145 146 type MetricAggregationResultList struct { 147 Items []MetricAggregationResult 148 } 149 ``` 150 151 ### Aggregations 152 153 Several different aggregations will be supported. Aggregations should be 154 performed in the metrics sink. If more aggregations later become supported 155 across all metrics sinks, the list can be expanded. 156 157 - Average (arithmetic mean): `/metrics-aggregated/average` 158 - Maximum: `/metrics-aggregated/max` 159 - Minimum: `/metrics-aggregated/min` 160 - Percentile: `/metrics-aggregated/{number}-perc` 161 - Median: `/metrics-aggregated/median` 162 - Count: `/metrics-aggregated/count` 163 164 Note: to support all the existing sinks, the supported percentiles will be 165 limited to 50, 95, and 99. If additional percentile values later become 166 supported by other sinks, this list may be expanded (see the Sink Support 167 section below). 168 169 ### Example 170 171 Suppose that one wanted to retrieve the 95th percentile of CPU usage for a 172 given pod over the past 30 days, in 1 hour intervals, along with the maximum 173 usage for each interval. Call the pod "somepod", in the namespace "somens". 174 To fetch the results, you'd perform: 175 176 ``` 177 GET /api/v1/historical/namespaces/somens/pods/somepod/metrics-aggregated/95-perc,average/cpu/usage?start=2016-03-20T10:57:37-04:00&bucket=1h 178 ``` 179 180 Which would then return: 181 182 ```json 183 { 184 "bucketSize": "3600000000000", 185 "buckets": [ 186 { 187 "timestamp": "2016-03-20T10:57:37-04:00", 188 "average": "32", 189 "percentiles": { 190 "95": "27" 191 } 192 }, 193 ... 194 ] 195 } 196 ``` 197 198 ## Sink Support and Functionality 199 200 When Oldtimer receives a request, it will compose a query to the sink, send the 201 query to the sink, and the transform the results into the appropriate API 202 formats. Note that Oldtimer is designed to retrieve information that was 203 originally written by Heapster itself. Any information read by Oldtimer must 204 have been stored according to the Heapster storage schema. 205 206 All computations, filtering, etc should be performed in the sink. Oldtimer 207 should only be composing queries. Ergo, the feature set of Oldtimer must 208 represent the lowest-common-denominator of features supported by the sinks. 209 Oldtimer is meant to be an API for performing basic aggregations supported by 210 all of the sinks, and is not meant to be a general purpose query tool. 211 212 At the time of writing of this proposal, the following sinks were considered: 213 Hawkular, InfluxDB, GCM, and OpenTSDB. However, the aggregations supported are 214 fairly basic, so if new sinks are added, it should be fairly likely that they 215 support the required Oldtimer features. 216 217 ## Scaling and Performance Considerations 218 219 Since Oldtimer itself does not store any data, it should have a fairly low 220 memory footprint. The current plan is to have Oldtimer run as part of the main 221 Heapster executable. However, in the future it may be advantageous to have the 222 ability to split Oldtimer out into a separate executable in order to scale it 223 independently of Heapster. 224 225 The metrics sinks themselves should already have clustering support, and thus 226 can be scaled if needed. Since Oldtimer queries the metrics sinks themselves, 227 response latency should depend mainly on how quickly the sinks can respond to 228 queries.