github.com/simpleiot/simpleiot@v0.18.3/docs/adr/5-time-validation.md (about)

     1  # Time Validation
     2  
     3  - Author: Cliff Brake
     4  - PR/Discussion:
     5  - Status: discussion
     6  
     7  **Contents**
     8  
     9  <!-- toc -->
    10  
    11  ## Problem
    12  
    13  To date, SIOT has been deployed to systems with RTCs and solid network connections,
    14  so time is fairly stable, thus this has not been a big concern. However, we are looking
    15  to deploy to edge systems, some with cellular modem connections and some without a 
    16  battery backed RTC, so they may boot without a valid time.
    17  
    18  SIOT is very dependent on data having valid timestamps. If timestamps are not correct,
    19  the following probems may occur:
    20  
    21  - old data may be preferred over newer data in the point CRDT merge algorithm
    22  - data stored in time series databases may have the wrong time stamps
    23  
    24  Additionally, there are edge systems that don't have a real-time clock and 
    25  power up with an invalid time until a NTP process gets the current time.
    26  
    27  We may need some systems to operate (run rules, etc) without a valid network connection
    28  (offline) and valid time.
    29  
    30  ## Context/Discussion
    31  
    32  ### Clients affected
    33  
    34  - db (InfluxDB driver)
    35  - sync (sends data upstream)
    36  - store (not sure ???)
    37  
    38  The db and sync clients should not process points (or perhaps buffer them until) until we 
    39  are sure the system has a valid time. How does it get this information? Possibilities
    40  include:
    41  
    42  1. creating a broadcast or other special message subject that clients can optionally 
    43     listen to. Perhaps the NTP client can send this message.
    44    - synchronization may be a problem here if NTP client sends messages before a 
    45      client has started.
    46  1. query for system state, and NTP sync status could be a field in this state.
    47    - should this be part of the root device node?
    48    - or a special hard-coded message?
    49    - it would be useful to track system state as a standard point so it gets
    50      synchronized and stored in influxdb, therefore as part of the root node would
    51      be useful, or perhaps the NTP node.
    52  
    53  ### Offline operation
    54  
    55  System must function when offline without valid time. Again, for the point merge
    56  algorithm to work correctly, timestamps for new points coming into the store
    57  must be newer than what is currently stored. There are two possible scenarios:
    58  
    59  - Problem: system powers up with old time, and points in DB have newer time.
    60    - Solution: if we don't have a valid NTP time, then set system time to something
    61      later than the newest point timestamp in the store.
    62  - Problem: NTP sets the time "back" and there are newer points in the DB.
    63    - Solution: when we get a NTP time sync, verify it is not significantly earlier
    64      than the latest point timestamp in the system. If it is, update the point
    65      timestamps in the DB that are newer than the current time with the current time - 1yr. 
    66      This ensures that settings
    67      upstream (which are likely newer than the edge device) will update the points 
    68      in the edge device. This is not perfect, but if probably adequate for most systems.
    69  
    70  We currently don't queue data when an edge device is offline. This is a different
    71  concern which we will address later.
    72  
    73  The SIOT synchronization and point merge algorithm are designed to be simple
    74  and bandwidth efficient (works over Cat-M/NBIOT modems). There are design trade-offs. 
    75  It is not a full-blown
    76  replicated, log-based database that will work correctly in every situation. It is designed
    77  so that changes can be made in multiple locations while disconnected and when
    78  a connection is resumed, that data is merged intelligently. Typically, configuration
    79  changes are made at the portal, and sensor data is generated at the edge, so this
    80  works well in practice. 
    81  When in doubt,
    82  we prioritize changes made on the upstream (typically cloud instance), as that
    83  is the most user accessible system and is where most configuration changes will be 
    84  made. Sensor data is updated periodically, so that will automatically get refreshed
    85  typically within 15m max. The system works best when we have a valid time at every 
    86  location so we advise ensuring reliable network connections for every device, and at 
    87  a minimum have a reliable battery backed RTC in every device.
    88  
    89  ### Tracking the latest point timestamp
    90  
    91  It may make sense to write the latest point timestamp to the store meta table.
    92  
    93  ### Syncing time from Modem or GPS
    94  
    95  Will consider in future. Assume a valid network connection to NTP server for now.
    96  
    97  ### Tracking events where time is not correct
    98  
    99  It would be very useful to track events at edge devices where time is not
   100  correct and it requires a big jump to be corrected. 
   101  
   102  TODO: how can we determine this? From systemd-timedated logs?
   103  
   104  This information could be used to diagnose when a RTC battery needs replaced, etc.
   105  
   106  ### Verify time matches between synchronized instances
   107  
   108  A final check that may be useful is to verify time between synchronized instances are 
   109  relatively close. This is a final check to ensure the sync algorithm does not wreak havoc
   110  between systems, even if NTP is lying.
   111  
   112  ## Reference/Research
   113  
   114  ### NTP
   115  
   116  - https://wiki.archlinux.org/title/systemd-timesyncd
   117  - `timedatectl status` produces following output:
   118  
   119  ```
   120                 Local time: Thu 2023-06-01 18:22:23 EDT
   121             Universal time: Thu 2023-06-01 22:22:23 UTC
   122                   RTC time: Thu 2023-06-01 22:22:23
   123                  Time zone: US/Eastern (EDT, -0400)
   124  System clock synchronized: yes
   125                NTP service: active
   126            RTC in local TZ: no
   127  ```
   128  
   129  There is a [systemd-timedated](https://www.freedesktop.org/software/systemd/man/org.freedesktop.timedate1.html) D-Bus API.
   130  
   131  ## Decision
   132  
   133  what was decided.
   134  
   135  objections/concerns
   136  
   137  ## Consequences
   138  
   139  what is the impact, both negative and positive.
   140  
   141  ## Additional Notes/Reference