github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/accepted/aaa.md (about) 1 # Identity, Authentication, Authorization (and Audit) 2 3 ## tl;dr. 4 5 Here is what we shall implement initially. For definitions, the current 6 state, and rationale for why we choose this starting point, see the rest 7 of the document. 8 9 We shall add an authentication flow that uses an LDAP server. The first 10 versions will do only that, creating a user upon authentication. A user 11 thus added will automatically be added as a member of some default group 12 (configurably). These are regular users apart from being members of the 13 group. Administrators will be able to add them to other groups by using 14 the existing mechanisms. 15 16 Only Simple Authentication (username + password) will be supported. All 17 other authentication forms (including 2FA and change password flows) may 18 be added later, as required. 19 20 These users will be able to create access keys if they have the required 21 IAM privileges. The user and access key continue to work, until deleted 22 by an administrator. 23 24 Synchronizing LDAP and lakeFS users and groups will be performed at some 25 later point, using the lakeFS API. This will allow many use-cases, such 26 as: 27 28 * *Propagate* user deletions from LDAP to their access keys. 29 * *Synchronize* group memberships for lakeFS groups: populate members of 30 lakeFS groups from LDAP groups. 31 * *Synchronize* LDAP groups: ensure exactly LDAP groups matching a query 32 exist on lakeFS and are populated from LDAP. 33 34 ## IAAA defined 35 36 In a system, every operation is performed by some *thing*. We care what 37 these things are. The _identity_ is how we know that thing. Identities 38 are a business entity and may belong to people or to bots or to machines 39 (or whatever the business decides). At the very least an identity gives 40 some unique unchanging string. In lakeFS the identity is the user name. 41 42 _Authentication_ is the process of identifying. Typically, the business 43 needs include some security around authentication. The thing may or may 44 not be directly involved in every authentication. A human authenticates 45 to a web service using only periodic direct involvement, and can receive 46 a token (possibly time-limited) for use. Authenticating lets the system 47 identify the thing. A _credential_ is a piece of controlled information 48 that allows a thing to authenticate. 49 50 Not all things are allowed to perform all operations. Before allowing a 51 thing to perform an action the system _authorizes_ it. Often the system 52 also leaves some _audit_ trail of authentications (and attempts) as well 53 as some or all authorizations (and attempts). 54 55 ## Required and existing AAA 56 57 Required support for S3 API means some AAA features must be identical to 58 those in S3. Our users are almost all S3 users so their familiarity with 59 other features means we prefer to support those features too. 60 61 We should provide support for users with other identity providers. Most 62 important of these are Active Directory (and other LDAP) sources. These 63 might also serve as a model for possible future integrations but that is 64 out of scope of this redesign. 65 66 ### Identity 67 68 S3 supports users and roles. Users are typically human-shaped, machines 69 and robots identify as their roles. 70 71 lakeFS currently has no support for role identities. It is not clear if 72 there is any current user demand for such. 73 74 ### Authentication and credentials 75 76 lakeFS performs these types of authentication: 77 78 * lakeFS supports the S3 API. So it must support S3 authentication with 79 no changes otherwise existing clients cannot connect. The common form 80 is to use a secret access key credential. So this form must always be 81 supported, and is the only form currently supported. 82 83 Other forms of S3 authentication include shorter term tokens. Another 84 protocol for S3 API access uses presigned URLs. 85 86 * lakeFS API requires some authentication. Currently credentials can be 87 the AWS-style secret access key or a JWT. The CLI uses an access key, 88 the GUI uses a JWT. The GUI gets a JWT by using an access key, but it 89 is should be possible to change this. 90 91 ### Authorization 92 93 S3 configures authorization using IAM policies. This is (probably) most 94 commonly known to our users, and existing installations already use this 95 form. 96 97 There are (many) other kinds of authorization languages we might use but 98 IAM will probably need to be supported. 99 100 On the flip side, the (apparent) lack of extensibility of IAM within AWS 101 may lead to requiring parallel policies have to be stored within AWS and 102 lakeFS. This is made worse by users of lakeFS not necessarily being the 103 owners of AWS IAM policies on their data when _not_ stored on lakeFS. 104 105 ### Audit 106 107 Current lakeFS performs _no_ identity-linked audit. Some lines at DEBUG 108 level give the API action but the access key appears only when is has to 109 be fetched from the database -- and appears on a separate unlinked line: 110 111 ```log 112 TRACE [2021-09-27T09:46:50+03:00]pkg/db/tx.go:93 pkg/db.(*dbTx).Get SQL query executed successfully args="[AKIAIOSFODNN7EXAMPLE]" query="SELECT * FROM auth_credentials WHERE auth_credentials.access_key_id = $1" took="987.567µs" type=get 113 ... 114 DEBUG [2021-09-27T09:46:50+03:00]pkg/api/controller.go:2912 pkg/api.(*Controller).LogAction performing API action action=list_policies host="localhost:8000" message_type=action method=GET path="/api/v1/auth/policies?prefix=&after=&amount=100" request_id=ae2fbb59-f54f-4cb4-a25c-7dd1d7280538 service=api_gateway service_name=rest_api 115 ``` 116 117 A usable audit log would require at least: 118 119 1. Identities not credential IDs. 120 1. Identities linked to actions. 121 1. Explicit authentication events linking identities to credentials used 122 to authenticate. 123 124 ## Active Directory and LDAP support 125 126 Issue #2058 is to support Active Directory. This splits naturally along 127 the 3 A's: use Active Directory to _authenticate_, to _authorize_, or to 128 receive _audit_ logs of user operations. 129 130 This initial design is for authentication (only). 131 132 ### Authentication 133 134 This section will initially apply to GUI clients. We will have to allow 135 users to request access keys (or short-term tokens) to allow them to run 136 programs that use AWS S3 or the lakeFS API. 137 138 #### Decision 139 140 The initial version will support (only) Simple Authentication on an LDAP 141 server. This does not require developing new login screens. 142 143 When configured the lakeFS web UI will optionally authenticate users via 144 Simple Authentication on an LDAP server. On success it shall ensure the 145 user exists; if the user does not exist it will create the user, mark as 146 created via LDAP, and place them in a configured group. The user gets a 147 JWT and the flow continues as today. Selecting between LDAP or internal 148 login can be done once during the initial login. After that the web app 149 can reuse that method, providing a toggle to use the other one. That is 150 important to avoid being locked out! 151 152 ### Authorization 153 154 There is room for business logic when authorizing. Typically users have 155 properties attached during authentication, which some business logic can 156 connect to attached policies. This can be quite complex, but we already 157 have a groups mechanism in place which we might re-use. 158 159 Confusingly, LDAP offers multiple ways to query group membership. While 160 many installations offer the `memberOf` attribute on users, there may be 161 limitations on what exactly it contains. Microsoft Active Directory has 162 support [with limitations][ms-ad-memberof]. OpenLDAP supports it, if an 163 appropriate [overlay][open-ldap-memberof] is configured. In the reverse 164 direction, if we are willing to perform operations in advance then _all_ 165 LDAP servers should support reading groups. 166 167 In practice `memberOf` is less useful than it seems: group membership at 168 authentication time is not necessarily up-to-date. Consider a user with 169 group memberships at authentication time who creates and starts using an 170 S3 access key, but never logs in again. If LDAP is a source of truth of 171 group memberships then refreshes will be required. 172 173 #### Decision 174 175 lakeFS shall use locally-configured groups. Existing IAM shall continue 176 to work. 177 178 In future we may add external facilities to synchronize groups from LDAP 179 servers into lakeFS, or to query group memberships for groups defined on 180 lakeFS. Which we add will depend on user requirements. 181 182 ### Audit 183 184 We currently have no requirements to audit API key usage in any way. We 185 can add these on user request -- not at this time. 186 187 <!-- references --> 188 [ms-ad-memberof]: https://ldapwiki.com/wiki/MemberOf#section-MemberOf-BewareOfMemberOf 189 [open-ldap-memberof]: https://www.openldap.org/doc/admin24/overlays.html#Reverse%20Group%20Membership%20Maintenance