github.com/authzed/spicedb@v1.32.1-0.20240520085336-ebda56537386/internal/datastore/crdb/README.md (about)

     1  # CockroachDB Datastore
     2  
     3  CockroachDB is a Spanner-like datastore supporting global, immediate consistency, with the mantra "no stale reads."
     4  The CockroachDB implementation should be used when your SpiceDB service runs in multiple geographic regions, and Google's Cloud Spanner is unavailable (e.g. AWS, Azure, bare metal.)
     5  
     6  ## Implementation Caveats
     7  
     8  In order to prevent the new-enemy problem, we need to make related transactions overlap.
     9  We do this by choosing a common database key and writing to that key with all relationships that may overlap.
    10  This tradeoff is cataloged in our blog post [The One Crucial Difference Between Spanner and CockroachDB](https://authzed.com/blog/prevent-newenemy-cockroachdb/).
    11  
    12  ## Overlap Strategies
    13  
    14  There are three transaction overlap strategies:
    15  
    16  - `insecure`, which does not protect against the new enemy problem
    17  - `static`, which protects all writes from the new enemy problem
    18  - `request`, which protects all writes with the same [request metadata key](https://github.com/authzed/authzed-go/blob/d97cfb41027742d347391f583dd9c6d1d03ae32b/pkg/requestmeta/requestmeta.go#L26-L30).
    19  - `prefix`, which protects all writes with the same object prefix from the new enemy problem
    20  
    21  Depending on your application, `insecure` may be acceptable, and it avoids the performance cost associated with the `static` and `prefix` options.
    22  
    23  ## When is `insecure` overlap a problem?
    24  
    25  Using `insecure` overlap strategy for SpiceDB with CockroachDB means that it is _possible_ that timestamps for two subsequent writes will be out of order.
    26  When this happens, it's _possible_ for the [New Enemy Problem](https://authzed.com/blog/prevent-newenemy-cockroachdb/) to occur.
    27  
    28  Let's look at how likely this is, and what the impact might actually be for your workload.
    29  
    30  ## When can timestamps be reversed?
    31  
    32  Before we look at how this can impact an application, let's first understand when and how timestamps can be reversed in the first place.
    33  
    34  - When two writes are made in short succession against CockroachDB
    35  - And those two writes hit two different gateway nodes
    36  - And the CRDB gateway node clocks have a delta `D`
    37  - And the writes touch disjoint sets of relationships
    38  - And those two writes are sent within the time delta `D` between the gateway nodes
    39  - And the writes land in ranges whose followers are disjoint sets of nodes
    40  - And other independent cockroach processes (heartbeats, etc) haven't coincidentally synced the gateway node clocks during the writes.
    41  
    42  Then it's possible that the second write will be assigned a timestamp earlier than the first write. In the next section we'll look at whether that matters for your application, but for now let's look at what makes the above conditions more or less likely:
    43  
    44  - **Clock skew**. A larger clock skew gives a bigger window in which timestamps can be reversed. But note that CRDB enforces a max offset between clocks, and getting within some fraction of that max offset will kick the node from the cluster.
    45  - **Network congestion**, or anything that interferes with node heartbeating. This increases the length of time that clocks can be desynchronized befor Cockroach notices and syncs them back up.
    46  - **Cluster size**. When there are many nodes, it is more likely that a write to one range will not have follower nodes that overlap with the followers of a write to another range. It also makes it more likely that the two writes will have different gateway nodes. On the other side, a 3 node cluster with `replicas: 3` means that all writes will sync clocks on all nodes.
    47  - **Write rate**. If the write rate is high, it's more likely that two writes will hit the conditions to have reversed timestamps. If writes only happen once every max offset period for the cluster, it's impossible for their timestamps to be reversed.
    48  
    49  The likelihood of a timestamp reversal is dependent on the cockroach cluster and the application's usage patterns.
    50  
    51  ## When does a timestamp reversal matter?
    52  
    53  Now we know when timestamps _could_ be reversed. But when does that matter to your application?
    54  
    55  The TL;DR is: only when you care about the New Enemy Problem.
    56  
    57  Let's take a look at a couple of examples of how reversed timestamps may be an issue for an application storing permissions in SpiceDB.
    58  
    59  ### Neglecting ACL Update Order
    60  
    61  Two separate `WriteRelationship` calls come in:
    62  
    63  - `A`: Alice removes Bob from the `shared` folder
    64  - `B`: Alice adds a new document `not-for-bob.txt` to the `shared` folder
    65  
    66  The normal case is that the timestamp for `A` < the timestamp for `B`.
    67  
    68  But if those two writes hit the conditions for a timestamp reversal, then `B < A`.
    69  
    70  From Alice's perspective, there should be no time at which Bob can ever see `not-for-bob.txt`.
    71  She performed the first write, got a response, and then performed the second write.
    72  
    73  But this isn't true when using `MinimizeLatency` or `AtLeastAsFresh` consistency.
    74  If Bob later performs a `Check` request for the `not-for-bob.txt` document, it's possible that SpiceDB will pick an evaluation timestamp such that `B < T < A`, so that the document is in the folder _and_ bob is allowed to see the contents of the folder.
    75  
    76  Note that this is only possible if `A - T < quantization window`: the check has to happen soon enough after the write for `A` that it's possible that SpiceDB picks a timestamp in between them.
    77  The default quantization window is `5s`.
    78  
    79  #### Application Mitigations for ACL Update Order
    80  
    81  This could be mitigated in your application by:
    82  
    83  - Not caring about the problem
    84  - Not allowing the write from `B` within the max_offset time of the CRDB cluster (or the quantization window).
    85  - Not allowing a Check on a resource within max_offset of its ACL modification (or the quantization window).
    86  
    87  ### Mis-apply Old ACLs to New Content
    88  
    89  Two separate API calls come in:
    90  
    91  - `A`: Alice remove Bob as a viewer of document `secret`
    92  - `B`: Alice does a `FullyConsistent` `Check` request to get a ZedToken
    93  - `C`: Alice stores that ZedToken (timestamp `B`) with the document `secret` when she updates it to say `Bob is a fool`.
    94  
    95  Same as before, the normal case is that the timestamp for `A` < the timestamp for `B`, but if the two writes hit the conditions for a timestamp reversal, then `B < A`.
    96  
    97  Bob later tries to read the document. The application performs an `AtLeastAsFresh` `Check` for Bob to access the document `secret` using the stored Zedtoken (which is timestamp `B`.)
    98  
    99  It's possible that SpiceDB will pick an evaluation timestamp `T` such that `B < T < A`, so that bob is allowed to read the newest contents of the document, and discover that Alice thinks he is a fool.
   100  
   101  Same as before, this is only possible if `A - T < quantization window`: Bob's check has to happen soon enough after the write for `A` that it's possible that SpiceDB picks a timestamp in between `A` and `B`, and the default quantization window is `5s`.
   102  
   103  #### Application Mitigations for Misapplying Old ACLs
   104  
   105  This could be mitigated in your application by:
   106  
   107  - Not caring about the problem
   108  - Waiting for max_offset (or the quantization window) before doing the fully-consistent check.
   109  
   110  ## When does a timestamp reversal _not_ matter?
   111  
   112  There are also some cases when there is no New Enemy Problem even if there are reversed timestamps.
   113  
   114  ### Non-sensitive domain
   115  
   116  Not all authorization problems have a version of the New Enemy Problem, which relies on there being some meaningful
   117  consequence of hitting an incorrect ACL during the small window of time where it's possible.
   118  
   119  If the worst thing that happens from out-of-order ACL updates is that some users briefly see some non-sensitive data,
   120  or that a user retains access to something that they already had access to for a few extra seconds, then even though
   121  there could still effectively be a "New Enemy Problem," it's not a meaningful problem to worry about.
   122  
   123  ### Disjoint SpiceDB Graphs
   124  
   125  The examples of the New Enemy Problem above rely on out-of-order ACLs to be part of the same permission graph.
   126  But not all ACLs are part of the same graph, for example:
   127  
   128  ```haskell
   129  definition user {}
   130  
   131  definition blog {
   132      relation author: user
   133      permission edit = author
   134  }
   135  
   136  defintion video {
   137      relation editor: user
   138      permission change_tags = editor
   139  }
   140  ```
   141  
   142  `A`: Alice is added as an `author` of the Blog entry `new-enemy`
   143  `B`: Bob is removed from the `editor`s of the `spicedb.mp4` video
   144  
   145  If these writes are given reversed timestamps, it is possible that the ACLs will be applied out-or-order and this would
   146  normally be a New Enemy Problem. But the ACLs themselves aren't shared between any permission computations, and so there
   147  is no actual consequence to reversed timestamps.