github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/design/open/spark/grand-theft-s3-client.md (about)

     1  # Getting S3 Clients in Spark Clients
     2  
     3  ## Why?
     4  
     5  We have 2 clients for Spark that work on lakeFS but also need to access AWS
     6  S3 directly:
     7  
     8  <table>
     9    <tr>
    10  	<th>Client</th><th>Where used</th><th>Why it needs an S3 client</th>
    11    </tr><tr>
    12  	<td>Spark Metadata client</td>
    13  	<td>
    14  	  GC (both committed and uncommititted), Spark Export, and also
    15        available for users to use to access lakeFS metadata directly.
    16  	</td><td>
    17  	  Accesses stored metadata directly on S3 and deletes data objects.
    18  	</td>
    19    </tr><tr>
    20  	<td>
    21  	  lakeFSFS
    22  	</td>
    23  	<td>
    24  	  Reading and writing directly on lakeFS.
    25  	</td>
    26  	<td>
    27  	  Read ETags of uploaded objects to put them in lakeFS metadata.  In
    28        _some_ Hadoop versions, the S3AFileSystem returns FileStatus objects
    29        with a S3AFileStatus.getETag method.  Otherwise a separate call to S3
    30        is needed.
    31  	</td>
    32    </tr>
    33  </table>
    34  
    35  ![David Niven as Sir Charles Litton, The Pink Panther][pink-panther-img]
    36  
    37  These Spark clients cannot work without a working S3 client[^1].  This is:
    38  
    39  * **Different** between our two clients.
    40  
    41    The Spark metadata client supports _only_ authentication to S3 using
    42    access keys or STS, lakeFSFS supports _only_ taking ("_stealing_") clients
    43    from S3AFileSystem.
    44  
    45  * **Brittle**.
    46  
    47    Some users cannot use the authentication methods that we make available to
    48    them.  The thievery code in lakeFSFS is subtle and greatly depends on an
    49    assumed underlying implementation; it can break when DataBricks introduce
    50    new features.
    51  * **Uninformative** in the case of system or user error.
    52  
    53    Users receive very poor error reports.  If they get as far as an S3 client
    54    but it is misconfigured, S3 happily generates "400 Bad Request" messages.
    55    If client theft fails, it generates a report of the _last_ failure --
    56    probably not the most _important_ failure.
    57  
    58  There are numerous bug reports and user questions about this area.
    59  
    60  ## What?
    61  
    62  We propose to:
    63  
    64  1. **Reduce friction.**  When S3A already works on a Spark installation,
    65     users should typically not have to add _any_ S3-related configuration in
    66     order to use lakeFS clients.
    67  1. **Unify** S3 client acquisition between the two clients.  Both clients will
    68     support the same configuration options: clients "stolen" from the
    69     underlying S3AFileSystem, and explicitly created clients with static
    70     access keys and STS.  Prefer stealing clients to creating them -- these
    71     are most likely to work.
    72  1. **Improve** error reporting.  Report the stages attempted and how each
    73     one failed.
    74  1. **Create a more general scheme** for generating clients.  Over time we
    75     can hope to support more underlying implementations.
    76  
    77  ## Design principles
    78  
    79  Unify client generation code into a single library.  We will be able to test
    80  this library individually on various Spark setups.  This will probably not
    81  be automatic -- there is no automatic source for _new_ Spark setups, and it
    82  is not clear how often _existing_ Spark setups change.  But even being able
    83  to run a single command on a Spark cluster and get useful information will
    84  be very useful for investigation, helping customers probe their setup, and
    85  further development to support setups where we fail.
    86  
    87  This library will define an interface for _client acquisition_: given
    88  various parameters TBD (perhaps a SparkContext or a Hadoop configuration), a
    89  path, and optionally also a FileSystem on that path, a client acquisition
    90  attempt returns a client or a failure message.
    91  
    92  A future version may well generalize to acquiring a client for other
    93  underlying storage types from other FileSystems.
    94  
    95  The library will include code that tries each of a list of strategies, in
    96  order of desirability.  It will return a client or throw some exception with
    97  a detailed method.  And it will report which strategy was actually used to
    98  acquire the client.  To increase performance, the library will cache to
    99  client used by FileSystem.  This will typically mean that the acquisition
   100  code is called just once.
   101  
   102  The list of strategies will be configurable on a Hadoop property.
   103  Additionally we will create pre-populated lists, one recommended for
   104  no-hassle production use and the other consisting of all (or almost)
   105  strategies that will be recommended for debugging.  Users who explicitly
   106  wish to use a single strategy will simply configure that one strategy as the
   107  only option.
   108  
   109  One complication is that many FileSystems are _layered_ and strategies to
   110  detect them may require some recursion or at least iteration.  For instance,
   111  while S3A may support `S3AFileSystem.getAmazonS3Client`, on DataBricks we
   112  might have to unwrap it from `CredentialScopeFileSystem` using
   113  `CredentialScopeFileSystem.getReadDelegate`, and then try to acquire an S3
   114  client from whatever is returned.
   115  
   116  The type of the returned client is indeterminate to the caller.  It _is_ an
   117  AmazonS3Client with desired authentication and the ability to connect to the
   118  bucket.  But it may well be one of a different version or package than the
   119  caller expects, and if the caller so much as attempts to cast it to
   120  AmazonS3Client it will get a nasty ClassCastException.  Similarly for its
   121  *Request and *Response objects.  Everything should be done using reflection,
   122  and the library should also help with this call.[^2]
   123  
   124  [^2]: The current code assumes that the expected Request object has the same
   125      actual type and is compatible.  This is a bug and will surely break
   126      somewhere.
   127  
   128  ### Example: information a strategy might return
   129  
   130  One method of generating a FileSystem is to call `getWrappedFs` _if_ that
   131  FileSystem has such a call, and recurse on that.  When such a strategy fails,
   132  it should report:
   133  
   134  1. The dynamic type of FileSystem that it received.
   135  1. What failed:
   136     1. `getWrappedFs`?  For instance, if it received a FileSystem that does
   137        not have this method.
   138     1. Acquiring a FileSystem from the wrapped instance?  This is a recursive
   139        attempt, and its failures will also include information about failure.
   140  
   141  
   142  [pink-panther-img]:  https://static.wikia.nocookie.net/pinkpanther/images/7/76/David_Niven_-_01.webp/revision/latest?cb=20220531105637
   143  
   144  [^1]: lakeFSFS _might_ not need the client, if it can find an ETag on
   145      returned FileStatus objects.