github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/clients/python-wrapper/README.md (about) 1 # lakeFS High-Level Python SDK 2 3 lakeFS High Level SDK for Python, provides developers with the following features: 4 1. Simpler programming interface with less configuration 5 2. Inferring identity from environment 6 3. Better abstractions for common, more complex operations (I/O, transactions, imports) 7 8 ## Requirements 9 10 Python 3.9+ 11 12 ## Installation & Usage 13 14 ### pip install 15 16 ```sh 17 pip install lakefs 18 ``` 19 20 ### Import the package 21 22 ```python 23 import lakefs 24 ``` 25 26 ## Getting Started 27 28 Please follow the [installation procedure](#installation--usage) and afterward refer to the following example snippet for a quick start: 29 30 ```python 31 32 import lakefs 33 from lakefs.client import Client 34 35 # Using default client will attempt to authenticate with lakeFS server using configured credentials 36 # If environment variables or .lakectl.yaml file exist 37 repo = lakefs.repository(repository_id="my-repo") 38 39 # Or explicitly initialize and provide a Client object 40 clt = Client(username="<lakefs_access_key_id>", password="<lakefs_secret_access_key>", host="<lakefs_endpoint>") 41 repo = lakefs.Repository(repository_id="my-repo", client=clt) 42 43 # From this point, proceed using the package according to documentation 44 main_branch = repo.create(storage_namespace="<storage_namespace>").branch(branch_id="main") 45 ... 46 ``` 47 48 ## Examples 49 50 ### Print sizes of all objects in lakefs://repo/main~2 51 52 ```py 53 ref = lakefs.Repository("repo").ref("main~2") 54 for obj in ref.objects(): 55 print(f"{o.path}: {o.size_bytes}") 56 ``` 57 58 ### Difference between two branches 59 60 ```py 61 for i in lakefs.Repository("repo").ref("main").diff("twig"): 62 print(i) 63 ``` 64 65 You can also use the [ref expression][lakefs-spec-ref]s here, for instance 66 `.diff("main~2")` also works. Ref expressions are the lakeFS analogues of 67 [how Git specifies revisions][git-spec-rev]. 68 69 ### Search a stored object for a string 70 71 ```py 72 with lakefs.Repository("repo").ref("main").object("path/to/data").reader(mode="r") as f: 73 for l in f: 74 if "quick" in l: 75 print(l) 76 ``` 77 78 ### Upload and commit some data 79 80 ```py 81 with lakefs.Repository("golden").branch("main").object("path/to/new").writer(mode="wb") as f: 82 f.write(b"my data") 83 84 # Returns a Reference 85 lakefs.Repository("golden").branch("main").commit("added my data using lakeFS high-level SDK") 86 87 # Prints "my data" 88 with lakefs.Repository("golden").branch("main").object("path/to/new").reader(mode="r") as f: 89 for l in f: 90 print(l) 91 ``` 92 93 Unlike references, branches are readable. This example couldn't work if we used a ref. 94 95 ## Tests 96 97 To run the tests using `pytest`, first clone the lakeFS git repository 98 99 ```sh 100 git clone https://github.com/treeverse/lakeFS.git 101 cd lakefs/clients/python-wrapper 102 ``` 103 104 ### Unit Tests 105 106 Inside the `tests` folder, execute `pytest utests` to run the unit tests. 107 108 ### Integration Tests 109 110 See [testing documentation](https://github.com/treeverse/lakeFS/blob/master/clients/python-wrapper/tests/integration/README.md) for more information 111 112 ## Documentation 113 114 [lakeFS Python SDK](https://pydocs-lakefs.lakefs.io/) 115 116 ## Author 117 118 services@treeverse.io 119 120 [git-spec-rev]: https://git-scm.com/docs/git-rev-parse#_specifying_revisions 121 [lakefs-spec-ref]: https://docs.lakefs.io/understand/model.html#ref-expressions