github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/aaa.md (about)

     1  # Identity, Authentication, Authorization (and Audit)
     2  
     3  ## tl;dr.
     4  
     5  Here is what we shall implement initially.  For definitions, the current
     6  state, and rationale for why we choose this starting point, see the rest
     7  of the document.
     8  
     9  We shall add an authentication flow that uses an LDAP server.  The first
    10  versions will do only that, creating a user upon authentication.  A user
    11  thus added will automatically be added as a member of some default group
    12  (configurably).  These are regular users apart from being members of the
    13  group.  Administrators will be able to add them to other groups by using
    14  the existing mechanisms.
    15  
    16  Only Simple Authentication (username + password) will be supported.  All
    17  other authentication forms (including 2FA and change password flows) may
    18  be added later, as required.
    19  
    20  These users will be able to create access keys if they have the required
    21  IAM privileges.  The user and access key continue to work, until deleted
    22  by an administrator.
    23  
    24  Synchronizing LDAP and lakeFS users and groups will be performed at some
    25  later point, using the lakeFS API.  This will allow many use-cases, such
    26  as:
    27  
    28  * *Propagate* user deletions from LDAP to their access keys.
    29  * *Synchronize* group memberships for lakeFS groups: populate members of
    30    lakeFS groups from LDAP groups.
    31  * *Synchronize* LDAP groups: ensure exactly LDAP groups matching a query
    32    exist on lakeFS and are populated from LDAP.
    33  
    34  ## IAAA defined
    35  
    36  In a system, every operation is performed by some *thing*.  We care what
    37  these things are.  The _identity_ is how we know that thing.  Identities
    38  are a business entity and may belong to people or to bots or to machines
    39  (or whatever the business decides).  At the very least an identity gives
    40  some unique unchanging string.  In lakeFS the identity is the user name.
    41  
    42  _Authentication_ is the process of identifying.  Typically, the business
    43  needs include some security around authentication.  The thing may or may
    44  not be directly involved in every authentication.  A human authenticates
    45  to a web service using only periodic direct involvement, and can receive
    46  a token (possibly time-limited) for use.  Authenticating lets the system
    47  identify the thing.  A _credential_ is a piece of controlled information
    48  that allows a thing to authenticate.
    49  
    50  Not all things are allowed to perform all operations.  Before allowing a
    51  thing to perform an action the system _authorizes_ it.  Often the system
    52  also leaves some _audit_ trail of authentications (and attempts) as well
    53  as some or all authorizations (and attempts).
    54  
    55  ## Required and existing AAA
    56  
    57  Required support for S3 API means some AAA features must be identical to
    58  those in S3. Our users are almost all S3 users so their familiarity with
    59  other features means we prefer to support those features too.
    60  
    61  We should provide support for users with other identity providers.  Most
    62  important of these are Active Directory (and other LDAP) sources.  These
    63  might also serve as a model for possible future integrations but that is
    64  out of scope of this redesign.
    65  
    66  ### Identity
    67  
    68  S3 supports users and roles.  Users are typically human-shaped, machines
    69  and robots identify as their roles.
    70  
    71  lakeFS currently has no support for role identities.  It is not clear if
    72  there is any current user demand for such.
    73  
    74  ### Authentication and credentials
    75  
    76  lakeFS performs these types of authentication:
    77  
    78  * lakeFS supports the S3 API.  So it must support S3 authentication with
    79    no changes otherwise existing clients cannot connect.  The common form
    80    is to use a secret access key credential.  So this form must always be
    81    supported, and is the only form currently supported.
    82    
    83    Other forms of S3 authentication include shorter term tokens.  Another
    84    protocol for S3 API access uses presigned URLs.
    85  
    86  * lakeFS API requires some authentication.  Currently credentials can be
    87    the AWS-style secret access key or a JWT.  The CLI uses an access key,
    88    the GUI uses a JWT.  The GUI gets a JWT by using an access key, but it
    89    is should be possible to change this.
    90  
    91  ### Authorization
    92  
    93  S3 configures authorization using IAM policies.  This is (probably) most
    94  commonly known to our users, and existing installations already use this
    95  form.
    96  
    97  There are (many) other kinds of authorization languages we might use but
    98  IAM will probably need to be supported.
    99  
   100  On the flip side, the (apparent) lack of extensibility of IAM within AWS
   101  may lead to requiring parallel policies have to be stored within AWS and
   102  lakeFS.  This is made worse by users of lakeFS not necessarily being the
   103  owners of AWS IAM policies on their data when _not_ stored on lakeFS.
   104  
   105  ### Audit
   106  
   107  Current lakeFS performs _no_ identity-linked audit.  Some lines at DEBUG
   108  level give the API action but the access key appears only when is has to
   109  be fetched from the database -- and appears on a separate unlinked line:
   110  
   111  ```log
   112  TRACE  [2021-09-27T09:46:50+03:00]pkg/db/tx.go:93 pkg/db.(*dbTx).Get SQL query executed successfully               args="[AKIAIOSFODNN7EXAMPLE]" query="SELECT * FROM auth_credentials WHERE auth_credentials.access_key_id = $1" took="987.567µs" type=get
   113  ...
   114  DEBUG  [2021-09-27T09:46:50+03:00]pkg/api/controller.go:2912 pkg/api.(*Controller).LogAction performing API action                         action=list_policies host="localhost:8000" message_type=action method=GET path="/api/v1/auth/policies?prefix=&after=&amount=100" request_id=ae2fbb59-f54f-4cb4-a25c-7dd1d7280538 service=api_gateway service_name=rest_api
   115  ```
   116  
   117  A usable audit log would require at least:
   118  
   119  1. Identities not credential IDs.
   120  1. Identities linked to actions.
   121  1. Explicit authentication events linking identities to credentials used
   122     to authenticate.
   123  
   124  ## Active Directory and LDAP support
   125  
   126  Issue #2058 is to support Active Directory.  This splits naturally along
   127  the 3 A's: use Active Directory to _authenticate_, to _authorize_, or to
   128  receive _audit_ logs of user operations.
   129  
   130  This initial design is for authentication (only).
   131  
   132  ### Authentication
   133  
   134  This section will initially apply to GUI clients.  We will have to allow
   135  users to request access keys (or short-term tokens) to allow them to run
   136  programs that use AWS S3 or the lakeFS API.
   137  
   138  #### Decision
   139  
   140  The initial version will support (only) Simple Authentication on an LDAP
   141  server.  This does not require developing new login screens.
   142  
   143  When configured the lakeFS web UI will optionally authenticate users via
   144  Simple Authentication on an LDAP server.  On success it shall ensure the
   145  user exists; if the user does not exist it will create the user, mark as
   146  created via LDAP, and place them in a configured group.  The user gets a
   147  JWT and the flow continues as today.  Selecting between LDAP or internal
   148  login can be done once during the initial login.  After that the web app
   149  can reuse that method, providing a toggle to use the other one.  That is
   150  important to avoid being locked out!
   151  
   152  ### Authorization
   153  
   154  There is room for business logic when authorizing.  Typically users have
   155  properties attached during authentication, which some business logic can
   156  connect to attached policies.  This can be quite complex, but we already
   157  have a groups mechanism in place which we might re-use.
   158  
   159  Confusingly, LDAP offers multiple ways to query group membership.  While
   160  many installations offer the `memberOf` attribute on users, there may be
   161  limitations on what exactly it contains.  Microsoft Active Directory has
   162  support [with limitations][ms-ad-memberof].  OpenLDAP supports it, if an
   163  appropriate [overlay][open-ldap-memberof] is configured.  In the reverse
   164  direction, if we are willing to perform operations in advance then _all_
   165  LDAP servers should support reading groups.
   166  
   167  In practice `memberOf` is less useful than it seems: group membership at
   168  authentication time is not necessarily up-to-date.  Consider a user with
   169  group memberships at authentication time who creates and starts using an
   170  S3 access key, but never logs in again.  If LDAP is a source of truth of
   171  group memberships then refreshes will be required.
   172  
   173  #### Decision
   174  
   175  lakeFS shall use locally-configured groups.  Existing IAM shall continue
   176  to work.
   177  
   178  In future we may add external facilities to synchronize groups from LDAP
   179  servers into lakeFS, or to query group memberships for groups defined on
   180  lakeFS.  Which we add will depend on user requirements.
   181  
   182  ### Audit
   183  
   184  We currently have no requirements to audit API key usage in any way.  We
   185  can add these on user request -- not at this time.
   186  
   187  <!-- references -->
   188  [ms-ad-memberof]: https://ldapwiki.com/wiki/MemberOf#section-MemberOf-BewareOfMemberOf
   189  [open-ldap-memberof]: https://www.openldap.org/doc/admin24/overlays.html#Reverse%20Group%20Membership%20Maintenance