github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/clients/python-wrapper/README.md (about)

     1  # lakeFS High-Level Python SDK
     2  
     3  lakeFS High Level SDK for Python, provides developers with the following features:
     4  1. Simpler programming interface with less configuration
     5  2. Inferring identity from environment 
     6  3. Better abstractions for common, more complex operations (I/O, transactions, imports)
     7  
     8  ## Requirements
     9  
    10  Python 3.9+
    11  
    12  ## Installation & Usage
    13  
    14  ### pip install
    15  
    16  ```sh
    17  pip install lakefs
    18  ```
    19  
    20  ### Import the package
    21  
    22  ```python
    23  import lakefs
    24  ```
    25  
    26  ## Getting Started
    27  
    28  Please follow the [installation procedure](#installation--usage) and afterward refer to the following example snippet for a quick start:
    29  
    30  ```python
    31  
    32  import lakefs
    33  from lakefs.client import Client
    34  
    35  # Using default client will attempt to authenticate with lakeFS server using configured credentials
    36  # If environment variables or .lakectl.yaml file exist 
    37  repo = lakefs.repository(repository_id="my-repo")
    38  
    39  # Or explicitly initialize and provide a Client object 
    40  clt = Client(username="<lakefs_access_key_id>", password="<lakefs_secret_access_key>", host="<lakefs_endpoint>")
    41  repo = lakefs.Repository(repository_id="my-repo", client=clt)
    42  
    43  # From this point, proceed using the package according to documentation
    44  main_branch = repo.create(storage_namespace="<storage_namespace>").branch(branch_id="main")
    45  ...
    46  ```
    47  
    48  ## Examples
    49  
    50  ### Print sizes of all objects in lakefs://repo/main~2
    51  
    52  ```py
    53  ref = lakefs.Repository("repo").ref("main~2")
    54  for obj in ref.objects():
    55    print(f"{o.path}: {o.size_bytes}")
    56  ```
    57  
    58  ### Difference between two branches
    59  
    60  ```py
    61  for i in lakefs.Repository("repo").ref("main").diff("twig"):
    62     print(i)
    63  ```
    64  
    65  You can also use the [ref expression][lakefs-spec-ref]s here, for instance
    66  `.diff("main~2")` also works.  Ref expressions are the lakeFS analogues of
    67  [how Git specifies revisions][git-spec-rev].
    68  
    69  ### Search a stored object for a string
    70  
    71  ```py
    72  with lakefs.Repository("repo").ref("main").object("path/to/data").reader(mode="r") as f:
    73     for l in f:
    74       if "quick" in l:
    75  	   print(l)
    76  ```
    77  
    78  ### Upload and commit some data
    79  
    80  ```py
    81  with lakefs.Repository("golden").branch("main").object("path/to/new").writer(mode="wb") as f:
    82     f.write(b"my data")
    83  
    84  # Returns a Reference
    85  lakefs.Repository("golden").branch("main").commit("added my data using lakeFS high-level SDK")
    86  
    87  # Prints "my data"
    88  with lakefs.Repository("golden").branch("main").object("path/to/new").reader(mode="r") as f:
    89     for l in f:
    90       print(l)
    91  ```
    92  
    93  Unlike references, branches are readable.  This example couldn't work if we used a ref.
    94  
    95  ## Tests
    96  
    97  To run the tests using `pytest`, first clone the lakeFS git repository
    98  
    99  ```sh
   100  git clone https://github.com/treeverse/lakeFS.git
   101  cd lakefs/clients/python-wrapper
   102  ```
   103  
   104  ### Unit Tests
   105  
   106  Inside the `tests` folder, execute `pytest utests` to run the unit tests.
   107  
   108  ### Integration Tests
   109  
   110  See [testing documentation](https://github.com/treeverse/lakeFS/blob/master/clients/python-wrapper/tests/integration/README.md) for more information
   111  
   112  ## Documentation
   113  
   114  [lakeFS Python SDK](https://pydocs-lakefs.lakefs.io/) 
   115  
   116  ## Author
   117  
   118  services@treeverse.io
   119  
   120  [git-spec-rev]:  https://git-scm.com/docs/git-rev-parse#_specifying_revisions
   121  [lakefs-spec-ref]:  https://docs.lakefs.io/understand/model.html#ref-expressions