github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/metadata_kv/kv_auth.md (about)

     1  # Implementing Auth Package with KV Database
     2  
     3  This document describes the entities of lakeFS auth package, the relationship between them and offers
     4  a possible representation of them in the KV database.
     5  
     6  ## Entities
     7  
     8  Some entities, like `User` and `Group` has an `ID` field which is a serial int key handled by postgres. 
     9  It will be migrated to a generated random string ID.
    10  
    11  ### User
    12  
    13  - Can be looked up by `ID`, `Username` & `Email`.
    14  - Deleted by `Username`.
    15  - Listed by `Username`.
    16  
    17  ### Group
    18  
    19  - Looked up, deleted and listed by `DisplayName`.
    20  
    21  ### Policies
    22  
    23  - Looked up, deleted and listed by `DisplayName`.
    24  
    25  ### Credentials
    26  
    27  - Has a foreign id reference to `User.ID`.
    28      - The `User.ID` is used for looking up the `User.Username`.
    29      - Almost all actions are in the context of `User.Username`, i.e. it is always passed.
    30  - The only action without a `User.Username` passed is credentials lookup by `AccessKeyID`.
    31  
    32  
    33  ## Entities Relationship
    34  
    35  ### Group <-> User
    36  
    37  - Add/Remove `User` to `Group` by `User.Username` & `Group.DisplayName`. 
    38  - List all the `Group`s for a `User` by `User.Username`.
    39  - List all the `Users`s in a `Group` by `Group.DisplayName`.
    40  
    41  ### Policy <-> User
    42  
    43  - Attach/Detach `Policy` to `User` by `User.Username` & `Policy.DisplayName`.
    44  - List all `Policies` for a `User` by `User.Username`.
    45  - List all effective `Policies` for a `User` by `User.Username`, this includes all `Group Policies` for Groups the user is a member of.
    46  
    47  ### Policy <-> Group
    48  
    49  - Attach/Detach `Policy` to `Group` by `Group.DisplayName` & `Policy.DisplayName`.
    50  - List all `Policies` for a `Group` by `Group.DisplayName`.
    51      - `Group` cannot be a member of another `Group`, therefore all effective policies are attached to it directly.
    52  
    53  ## Representation in the KV world
    54  
    55  - All keys are prefixed by the reserved package prefix `auth/`.
    56    
    57  - `User`s key will be in the form of `users/<UserName>`.
    58  - `Group`s key will be in the form of `groups/<DisplayName>`.
    59  - `Policies`s key will be in the form of `policies/<DisplayName>`.
    60  - `Credentials`s key will be in the form of `users/<UserName>/credentials/<AccessKeyID>`.
    61  
    62  ### Handling Secondary Indexes
    63  
    64  Below are 2 options that vary in efficiency and complexity of the implementation.
    65   
    66  #### Store just the minimum indexes
    67  
    68  Keep only the minimum that is required to represent the relationship of the entities:
    69  
    70  - A `User` membership of a `Group` under `groups/<DisplayName>/users/<Username>`.
    71  - A `Policy` attached to a `User` under `users/<Username>/policies/<DisplayName>`.
    72  - A `Policy` attached to a `Group` under `groups/<Displayname>/policies/<DisplayName>`.
    73  
    74  Any deletion of an entity would first remove all its secondary indexes, then the entity itself.
    75  For example, deleting a `Policy` would have to list all `User`s & `Group`'s `Policies` and delete any attachment if existed.
    76  Only then it should delete the `Policy` entity from the store.
    77  
    78  Listing by anything other than the key will require to list all the entities which are relevant.
    79  Some examples:
    80  
    81  1. Finding a `User` by `AccessKeyID` will require to list all the users.
    82  1. Listing the `User`'s effective policies will require to list all entities under `groups/`.
    83  
    84  The amount of entities in the Auth world isn't big (<10k) and cached in the server, it's unlikely it will incur a 
    85  notable performance degradation.
    86  
    87  #### Store all secondary indexes
    88  
    89  In addition to the indexes in the above suggestion, also store:
    90  
    91  - A `User` membership of a `Group` under `users/<Username>/groups/<DisplayName>`.
    92  - `Credentials` attached to a `User` under `credentials/<AccessKeyID>/<Username>`.
    93  - A `Policy` attached to a `User` under `policies/<DisplayName>/users/<Username>`.
    94  - A `Policy` attached to a `Group` under `policies/<DisplayName>/groups/<DisplayName>`.
    95  
    96  Every operation that updates a relationship between two entities would have to be stored everywhere.
    97  The upside is a possible increase in performance ("possible" since we read less entities, but perform much more round-trips to the cache/store).
    98  
    99  ### Working with no locks
   100  
   101  In the Postgres era, we rely on it to keep the entities and relationship consistent.
   102  For example, a deleted `Policy` would cascade to the `auth_user_policies` & `auth_group_policies` tables.
   103  
   104  
   105  In the KV world, we must cleanup the secondary indexes first, before deleting the entity itself.
   106  Since the KV is lock-free, we might still get into troubles.
   107  For example, a `Policy` is requested to be deleted. The operation starts by listing all `User`s & `Group`s with that `Policy` attached to it.
   108  While iterating, it's possible that another unlisted `User` would be attached to the `Policy`. Once the `Policy` is deleted,
   109  we would remain with an attachment to it. There are at least 2 mitigations for it:
   110  
   111  1. Do (a) list and delete of secondary indexes (b) delete the entity (c) another round of list and delete.
   112  2. Any store access should consider that the secondary indexes are fragile, and should always retrieve the entities pointed by it.
   113  The "truth" is in the entities themselves and not in the secondary index.
   114  
   115  ### Decision
   116  
   117  Due to its simplicity and without clear evidence of lakeFS installations with many users, I believe that option #1 is better.
   118  As OIDC would likely be introduced soon, it's even likelier that users will handle the auth part elsewhere for installations with many users.