github.com/simpleiot/simpleiot@v0.18.3/docs/adr/5-time-validation.md (about) 1 # Time Validation 2 3 - Author: Cliff Brake 4 - PR/Discussion: 5 - Status: discussion 6 7 **Contents** 8 9 <!-- toc --> 10 11 ## Problem 12 13 To date, SIOT has been deployed to systems with RTCs and solid network connections, 14 so time is fairly stable, thus this has not been a big concern. However, we are looking 15 to deploy to edge systems, some with cellular modem connections and some without a 16 battery backed RTC, so they may boot without a valid time. 17 18 SIOT is very dependent on data having valid timestamps. If timestamps are not correct, 19 the following probems may occur: 20 21 - old data may be preferred over newer data in the point CRDT merge algorithm 22 - data stored in time series databases may have the wrong time stamps 23 24 Additionally, there are edge systems that don't have a real-time clock and 25 power up with an invalid time until a NTP process gets the current time. 26 27 We may need some systems to operate (run rules, etc) without a valid network connection 28 (offline) and valid time. 29 30 ## Context/Discussion 31 32 ### Clients affected 33 34 - db (InfluxDB driver) 35 - sync (sends data upstream) 36 - store (not sure ???) 37 38 The db and sync clients should not process points (or perhaps buffer them until) until we 39 are sure the system has a valid time. How does it get this information? Possibilities 40 include: 41 42 1. creating a broadcast or other special message subject that clients can optionally 43 listen to. Perhaps the NTP client can send this message. 44 - synchronization may be a problem here if NTP client sends messages before a 45 client has started. 46 1. query for system state, and NTP sync status could be a field in this state. 47 - should this be part of the root device node? 48 - or a special hard-coded message? 49 - it would be useful to track system state as a standard point so it gets 50 synchronized and stored in influxdb, therefore as part of the root node would 51 be useful, or perhaps the NTP node. 52 53 ### Offline operation 54 55 System must function when offline without valid time. Again, for the point merge 56 algorithm to work correctly, timestamps for new points coming into the store 57 must be newer than what is currently stored. There are two possible scenarios: 58 59 - Problem: system powers up with old time, and points in DB have newer time. 60 - Solution: if we don't have a valid NTP time, then set system time to something 61 later than the newest point timestamp in the store. 62 - Problem: NTP sets the time "back" and there are newer points in the DB. 63 - Solution: when we get a NTP time sync, verify it is not significantly earlier 64 than the latest point timestamp in the system. If it is, update the point 65 timestamps in the DB that are newer than the current time with the current time - 1yr. 66 This ensures that settings 67 upstream (which are likely newer than the edge device) will update the points 68 in the edge device. This is not perfect, but if probably adequate for most systems. 69 70 We currently don't queue data when an edge device is offline. This is a different 71 concern which we will address later. 72 73 The SIOT synchronization and point merge algorithm are designed to be simple 74 and bandwidth efficient (works over Cat-M/NBIOT modems). There are design trade-offs. 75 It is not a full-blown 76 replicated, log-based database that will work correctly in every situation. It is designed 77 so that changes can be made in multiple locations while disconnected and when 78 a connection is resumed, that data is merged intelligently. Typically, configuration 79 changes are made at the portal, and sensor data is generated at the edge, so this 80 works well in practice. 81 When in doubt, 82 we prioritize changes made on the upstream (typically cloud instance), as that 83 is the most user accessible system and is where most configuration changes will be 84 made. Sensor data is updated periodically, so that will automatically get refreshed 85 typically within 15m max. The system works best when we have a valid time at every 86 location so we advise ensuring reliable network connections for every device, and at 87 a minimum have a reliable battery backed RTC in every device. 88 89 ### Tracking the latest point timestamp 90 91 It may make sense to write the latest point timestamp to the store meta table. 92 93 ### Syncing time from Modem or GPS 94 95 Will consider in future. Assume a valid network connection to NTP server for now. 96 97 ### Tracking events where time is not correct 98 99 It would be very useful to track events at edge devices where time is not 100 correct and it requires a big jump to be corrected. 101 102 TODO: how can we determine this? From systemd-timedated logs? 103 104 This information could be used to diagnose when a RTC battery needs replaced, etc. 105 106 ### Verify time matches between synchronized instances 107 108 A final check that may be useful is to verify time between synchronized instances are 109 relatively close. This is a final check to ensure the sync algorithm does not wreak havoc 110 between systems, even if NTP is lying. 111 112 ## Reference/Research 113 114 ### NTP 115 116 - https://wiki.archlinux.org/title/systemd-timesyncd 117 - `timedatectl status` produces following output: 118 119 ``` 120 Local time: Thu 2023-06-01 18:22:23 EDT 121 Universal time: Thu 2023-06-01 22:22:23 UTC 122 RTC time: Thu 2023-06-01 22:22:23 123 Time zone: US/Eastern (EDT, -0400) 124 System clock synchronized: yes 125 NTP service: active 126 RTC in local TZ: no 127 ``` 128 129 There is a [systemd-timedated](https://www.freedesktop.org/software/systemd/man/org.freedesktop.timedate1.html) D-Bus API. 130 131 ## Decision 132 133 what was decided. 134 135 objections/concerns 136 137 ## Consequences 138 139 what is the impact, both negative and positive. 140 141 ## Additional Notes/Reference