github.com/cloudfoundry/cli@v7.1.0+incompatible/doc/adr/0009-log-cache-clock-issues.md (about) 1 # 9. Log Cache and Clock Issues 2 3 Date: 2020-04-08 4 5 ## Status 6 7 Accepted 8 9 ## Context 10 11 _The issue motivating this decision, and any context that influences or constrains the decision._ 12 13 The CF CLI previously used the Traffic Controller to retrieve application logs. 14 This was implemented as a long-lived WebSocket connection where logs were 15 streamed to the client. 16 17 With the migration to Log Cache we needed to preserve the experience of 18 streaming logs, but built [on an 19 API](https://github.com/cloudfoundry/log-cache-release/blob/f08a3081c029d133300b1d6cb5ea8ebbd2108874/src/README.md) 20 where timestamped log envelopes are retrieved via HTTP requests. 21 22 There were two main questions we needed to answer: 23 24 1. What is the oldest log that should be shown? 25 An application may have logs from a previous push already present in Log Cache 26 that do not relate to the current staging operation. We would want to only 27 show the logs for the current staging operation. 28 2. What is the newest log that can be shown? 29 Log Cache is eventually consistent, so very recent logs may be out of order 30 or incomplete. We would only want to show logs once they have "settled" (i.e. 31 logs that have been in Log Cache long enough to converge). 32 33 We also needed to answer a question regarding `cf logs --recent`: 34 35 3. Do we want to define a range of "settled" logs to show, or do we want to show everything currently in the Log Cache? 36 37 ## Decision 38 39 _The change that we're proposing or have agreed to implement._ 40 41 1. To determine the oldest log that will be shown: we will 'peek' at the latest log within 42 Log Cache for the application and read from that point. 43 2. To determine the newest log that will be shown: we will not show log envelopes where the 44 timestamp is less than two seconds old. 45 3. `cf logs --recent` will always show all logs currently in the Log Cache, regardless of whether they have "settled". 46 47 ### 1. Peeking at the latest log 48 49 The initial CLI implementation of Log Cache started reading from Log Cache at an 50 offset based on the current client clock time. This was flawed because an 51 incorrectly configured client clock would result in some unexpected behaviour: 52 53 * If the client clock was ahead of the server then either the logs would not be 54 shown at all, or there would be a lengthy delay before some of the logs would 55 be shown 56 * If the client clock was behind the server then logs relating to a previous 57 operation might be shown 58 59 In an attempt to decouple ourselves from the client clock time we instead 'peek' 60 at the timestamp of the latest log envelope for the application and use that 61 timestamp as our starting point. 62 63 If there are no envelopes present for the application we will continue to retry 64 until envelopes become available. 65 66 ### 2. Delaying the output of new logs 67 68 By default, the Log Cache client will only return log envelopes that have timestamps 69 more than a second old (see the [code](https://github.com/cloudfoundry/log-cache-release/blob/f08a3081c029d133300b1d6cb5ea8ebbd2108874/src/pkg/client/walk.go#L174-L183)). 70 This filtering mechanism is known as `WalkDelay` and is configurable. 71 72 In our testing we found that the default of one second was not sufficient to allow Log Cache to 73 "settle" and resulted in log loss on foundations with multiple Log Cache nodes. 74 This is because multiple Log Cache nodes may be ingesting events for the same 75 application, but a single Log Cache node hosts the cache for a given 76 application. It's possible for us to see a newer log envelope timestamp and move 77 our timestamp cursor forwards before we have received an earlier log envelope: 78 https://www.pivotaltracker.com/story/show/171759407/comments/212391238. 79 80 We decided to increase the `WalkDelay` to 2 seconds to give Log cache more time 81 to "settle" in a multiple Log Cache node foundation. 82 83 Note: an issue has been filed against the Log Cache client to increase the default 84 walk delay: https://github.com/cloudfoundry/go-log-cache/issues/29 85 86 ### 3. `cf logs --recent` behavior 87 88 We decided to implement `cf logs --recent` using a single request to Log Cache instead of a multiple pass 89 `Walk` implementation. This has better performance but means that there is no `WalkDelay` (i.e. the latest 90 logs returned by the request may not be "settled"). There were three options we considered to address this: 91 92 1. Do nothing. Simply return all of the logs that are currently in Log Cache. 93 2. Only render logs that are more than two seconds old. 94 3. Wait two seconds before showing all logs up to the point that the command was started. 95 96 We decided to go with option 1. Option 2 would most likely break CATs and other automated tooling run against 97 the `cf logs --recent` command due to the lag in logs appearing. Option 3 was a poor UX as it made the command 98 two seconds slower and would also be missing logs that were emitted after the command started. 99 100 See this [comment](https://www.pivotaltracker.com/story/show/172152836/comments/213375953) for more information. 101 102 ## Consequences 103 104 _What becomes easier or more difficult to do and any risks introduced by the change that will need to be mitigated._ 105 106 ### Peeking at the latest log 107 108 * Peeking at the latest envelope for an application removes a dependency on the 109 client clock, making us more tolerant to a situation where the client clock is 110 not closely synchronised to the server 111 * As each component that generates logs is responsible for assigning the 112 timestamp, there is still potential for clock drift within the foundation to 113 cause unexpected behavior. 114 * The code is slightly more involved, but it seems like a good trade-off 115 116 ### Delaying the output of new logs 117 118 * We are momentarily delaying the output of new log envelopes to the user, with 119 the advantage that by delaying we are able to output a more consistent view of 120 the logs 121 * The `WalkDelay` is implemented within the Log Cache client as a comparison of 122 the log envelope timestamps against the client clock. It is therefore 123 vulnerable to any misconfiguration of the client clock. We've considered 124 strategies for removing this dependency on the client clock but have decided 125 for now to wait for feedback from our users before attempting to make ourselves 126 reliable in this situation. An issue has been filed against the Log Cache 127 client: https://github.com/cloudfoundry/go-log-cache/issues/28 128 129 ### `cf logs --recent` behavior 130 131 * The most recent logs returned by this command may not be "settled". There may be some 132 discrepancies between the output of two consecutive runs of this command.