github.com/treeverse/lakefs@v1.24.1-0.20240520134607-95648127bfb0/docs/integrations/sagemaker.md (about) 1 --- 2 title: Amazon SageMaker 3 description: This section explains how to integrate your Amazon SageMaker installation to work with lakeFS. 4 parent: Integrations 5 redirect_from: /using/sagemaker.html 6 --- 7 8 # Using lakeFS with Amazon SageMaker 9 [Amazon SageMaker](https://aws.amazon.com/sagemaker/) helps to prepare, build, train and deploy ML models quickly by bringing together a broad set of capabilities purpose-built for ML. 10 11 {% include toc.html %} 12 13 ## Initializing session and client 14 15 Initialize a Sagemaker session and an S3 client with lakeFS as the endpoint: 16 ```python 17 import sagemaker 18 import boto3 19 20 endpoint_url = '<LAKEFS_ENDPOINT>' 21 aws_access_key_id = '<LAKEFS_ACCESS_KEY_ID>' 22 aws_secret_access_key = '<LAKEFS_SECRET_ACCESS_KEY>' 23 repo = 'example-repo' 24 25 sm = boto3.client('sagemaker', 26 endpoint_url=endpoint_url, 27 aws_access_key_id=aws_access_key_id, 28 aws_secret_access_key=aws_secret_access_key) 29 30 s3_resource = boto3.resource('s3', 31 endpoint_url=endpoint_url, 32 aws_access_key_id=aws_access_key_id, 33 aws_secret_access_key=aws_secret_access_key) 34 35 session = sagemaker.Session(boto3.Session(), sagemaker_client=sm, default_bucket=repo) 36 session.s3_resource = s3_resource 37 ``` 38 39 ## Usage Examples 40 41 ### Upload train and test data 42 43 Let's use the created session for uploading data to the 'main' branch: 44 45 ```python 46 prefix = "/prefix-within-branch" 47 branch = 'main' 48 49 train_file = 'train_data.csv'; 50 train_data.to_csv(train_file, index=False, header=True) 51 train_data_s3_path = session.upload_data(path=train_file, key_prefix=branch + prefix + "/train") 52 53 test_file = 'test_data.csv'; 54 test_data_no_target.to_csv(test_file, index=False, header=False) 55 test_data_s3_path = session.upload_data(path=test_file, key_prefix=branch + prefix + "/test") 56 ``` 57 58 ### Download objects 59 60 You can use the integration with lakeFS to download a portion of the data you see fit: 61 62 ```python 63 repo = 'example-repo' 64 prefix = "/prefix-to-download" 65 branch = 'main' 66 localpath = './' + branch 67 68 session.download_data(path=localpath, bucket=repo, key_prefix = branch + prefix) 69 ``` 70 71 **Note:** 72 Advanced AWS SageMaker features, like Autopilot jobs, are encapsulated and don't have the option to override the S3 endpoint. 73 However, it is possible to [export]({% link howto/export.md %}) the required inputs from lakeFS to S3. 74 <br/>If you're using SageMaker features that aren't supported by lakeFS, we'd love to [hear from you](https://lakefs.io/slack). 75 {: .note}