github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/open/spark/grand-theft-s3-client.md

github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/open/spark/grand-theft-s3-client.md (about)

1 # Getting S3 Clients in Spark Clients
2
3 ## Why?
4
5 We have 2 clients for Spark that work on lakeFS but also need to access AWS
6 S3 directly:
7
8 <table>
9 <tr>
10 <th>Client</th><th>Where used</th><th>Why it needs an S3 client</th>
11 </tr><tr>
12 <td>Spark Metadata client</td>
13 <td>
14 GC (both committed and uncommititted), Spark Export, and also
15 available for users to use to access lakeFS metadata directly.
16 </td><td>
17 Accesses stored metadata directly on S3 and deletes data objects.
18 </td>
19 </tr><tr>
20 <td>
21 lakeFSFS
22 </td>
23 <td>
24 Reading and writing directly on lakeFS.
25 </td>
26 <td>
27 Read ETags of uploaded objects to put them in lakeFS metadata. In
28 _some_ Hadoop versions, the S3AFileSystem returns FileStatus objects
29 with a S3AFileStatus.getETag method. Otherwise a separate call to S3
30 is needed.
31 </td>
32 </tr>
33 </table>
34
35 ![David Niven as Sir Charles Litton, The Pink Panther][pink-panther-img]
36
37 These Spark clients cannot work without a working S3 client[^1]. This is:
38
39 * **Different** between our two clients.
40
41 The Spark metadata client supports _only_ authentication to S3 using
42 access keys or STS, lakeFSFS supports _only_ taking ("_stealing_") clients
43 from S3AFileSystem.
44
45 * **Brittle**.
46
47 Some users cannot use the authentication methods that we make available to
48 them. The thievery code in lakeFSFS is subtle and greatly depends on an
49 assumed underlying implementation; it can break when DataBricks introduce
50 new features.
51 * **Uninformative** in the case of system or user error.
52
53 Users receive very poor error reports. If they get as far as an S3 client
54 but it is misconfigured, S3 happily generates "400 Bad Request" messages.
55 If client theft fails, it generates a report of the _last_ failure --
56 probably not the most _important_ failure.
57
58 There are numerous bug reports and user questions about this area.
59
60 ## What?
61
62 We propose to:
63
64 1. **Reduce friction.** When S3A already works on a Spark installation,
65 users should typically not have to add _any_ S3-related configuration in
66 order to use lakeFS clients.
67 1. **Unify** S3 client acquisition between the two clients. Both clients will
68 support the same configuration options: clients "stolen" from the
69 underlying S3AFileSystem, and explicitly created clients with static
70 access keys and STS. Prefer stealing clients to creating them -- these
71 are most likely to work.
72 1. **Improve** error reporting. Report the stages attempted and how each
73 one failed.
74 1. **Create a more general scheme** for generating clients. Over time we
75 can hope to support more underlying implementations.
76
77 ## Design principles
78
79 Unify client generation code into a single library. We will be able to test
80 this library individually on various Spark setups. This will probably not
81 be automatic -- there is no automatic source for _new_ Spark setups, and it
82 is not clear how often _existing_ Spark setups change. But even being able
83 to run a single command on a Spark cluster and get useful information will
84 be very useful for investigation, helping customers probe their setup, and
85 further development to support setups where we fail.
86
87 This library will define an interface for _client acquisition_: given
88 various parameters TBD (perhaps a SparkContext or a Hadoop configuration), a
89 path, and optionally also a FileSystem on that path, a client acquisition
90 attempt returns a client or a failure message.
91
92 A future version may well generalize to acquiring a client for other
93 underlying storage types from other FileSystems.
94
95 The library will include code that tries each of a list of strategies, in
96 order of desirability. It will return a client or throw some exception with
97 a detailed method. And it will report which strategy was actually used to
98 acquire the client. To increase performance, the library will cache to
99 client used by FileSystem. This will typically mean that the acquisition
100 code is called just once.
101
102 The list of strategies will be configurable on a Hadoop property.
103 Additionally we will create pre-populated lists, one recommended for
104 no-hassle production use and the other consisting of all (or almost)
105 strategies that will be recommended for debugging. Users who explicitly
106 wish to use a single strategy will simply configure that one strategy as the
107 only option.
108
109 One complication is that many FileSystems are _layered_ and strategies to
110 detect them may require some recursion or at least iteration. For instance,
111 while S3A may support `S3AFileSystem.getAmazonS3Client`, on DataBricks we
112 might have to unwrap it from `CredentialScopeFileSystem` using
113 `CredentialScopeFileSystem.getReadDelegate`, and then try to acquire an S3
114 client from whatever is returned.
115
116 The type of the returned client is indeterminate to the caller. It _is_ an
117 AmazonS3Client with desired authentication and the ability to connect to the
118 bucket. But it may well be one of a different version or package than the
119 caller expects, and if the caller so much as attempts to cast it to
120 AmazonS3Client it will get a nasty ClassCastException. Similarly for its
121 *Request and *Response objects. Everything should be done using reflection,
122 and the library should also help with this call.[^2]
123
124 [^2]: The current code assumes that the expected Request object has the same
125 actual type and is compatible. This is a bug and will surely break
126 somewhere.
127
128 ### Example: information a strategy might return
129
130 One method of generating a FileSystem is to call `getWrappedFs` _if_ that
131 FileSystem has such a call, and recurse on that. When such a strategy fails,
132 it should report:
133
134 1. The dynamic type of FileSystem that it received.
135 1. What failed:
136 1. `getWrappedFs`? For instance, if it received a FileSystem that does
137 not have this method.
138 1. Acquiring a FileSystem from the wrapped instance? This is a recursive
139 attempt, and its failures will also include information about failure.
140
141
142 [pink-panther-img]: https://static.wikia.nocookie.net/pinkpanther/images/7/76/David_Niven_-_01.webp/revision/latest?cb=20220531105637
143
144 [^1]: lakeFSFS _might_ not need the client, if it can find an ETag on
145 returned FileStatus objects.