github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/python_sdk.md (about) 1 --- 2 layout: post 3 title: PYTHON SDK 4 permalink: /docs/python-sdk 5 redirect_from: 6 - /python_sdk.md/ 7 - /docs/python_sdk.md/ 8 --- 9 10 AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters. 11 12 > For PyTorch integration and usage examples, please refer to [AIS Python SDK](https://pypi.org/project/aistore) available via Python Package Index (PyPI), or see [https://github.com/NVIDIA/aistore/tree/main/python/aistore](https://github.com/NVIDIA/aistore/tree/main/python/aistore). 13 14 * [client](#client) 15 * [Client](#client.Client) 16 * [bucket](#client.Client.bucket) 17 * [cluster](#client.Client.cluster) 18 * [job](#client.Client.job) 19 * [etl](#client.Client.etl) 20 * [dsort](#client.Client.dsort) 21 * [cluster](#cluster) 22 * [Cluster](#cluster.Cluster) 23 * [client](#cluster.Cluster.client) 24 * [get\_info](#cluster.Cluster.get_info) 25 * [get\_primary\_url](#cluster.Cluster.get_primary_url) 26 * [list\_buckets](#cluster.Cluster.list_buckets) 27 * [list\_jobs\_status](#cluster.Cluster.list_jobs_status) 28 * [list\_running\_jobs](#cluster.Cluster.list_running_jobs) 29 * [list\_running\_etls](#cluster.Cluster.list_running_etls) 30 * [is\_ready](#cluster.Cluster.is_ready) 31 * [get\_performance](#cluster.Cluster.get_performance) 32 * [bucket](#bucket) 33 * [Bucket](#bucket.Bucket) 34 * [client](#bucket.Bucket.client) 35 * [qparam](#bucket.Bucket.qparam) 36 * [provider](#bucket.Bucket.provider) 37 * [name](#bucket.Bucket.name) 38 * [namespace](#bucket.Bucket.namespace) 39 * [list\_urls](#bucket.Bucket.list_urls) 40 * [list\_all\_objects\_iter](#bucket.Bucket.list_all_objects_iter) 41 * [create](#bucket.Bucket.create) 42 * [delete](#bucket.Bucket.delete) 43 * [rename](#bucket.Bucket.rename) 44 * [evict](#bucket.Bucket.evict) 45 * [head](#bucket.Bucket.head) 46 * [summary](#bucket.Bucket.summary) 47 * [info](#bucket.Bucket.info) 48 * [copy](#bucket.Bucket.copy) 49 * [list\_objects](#bucket.Bucket.list_objects) 50 * [list\_objects\_iter](#bucket.Bucket.list_objects_iter) 51 * [list\_all\_objects](#bucket.Bucket.list_all_objects) 52 * [transform](#bucket.Bucket.transform) 53 * [put\_files](#bucket.Bucket.put_files) 54 * [object](#bucket.Bucket.object) 55 * [objects](#bucket.Bucket.objects) 56 * [make\_request](#bucket.Bucket.make_request) 57 * [verify\_cloud\_bucket](#bucket.Bucket.verify_cloud_bucket) 58 * [get\_path](#bucket.Bucket.get_path) 59 * [as\_model](#bucket.Bucket.as_model) 60 * [write\_dataset](#bucket.Bucket.write_dataset) 61 * [object](#object) 62 * [Object](#object.Object) 63 * [bucket](#object.Object.bucket) 64 * [name](#object.Object.name) 65 * [head](#object.Object.head) 66 * [get](#object.Object.get) 67 * [get\_semantic\_url](#object.Object.get_semantic_url) 68 * [get\_url](#object.Object.get_url) 69 * [put\_content](#object.Object.put_content) 70 * [put\_file](#object.Object.put_file) 71 * [promote](#object.Object.promote) 72 * [delete](#object.Object.delete) 73 * [blob\_download](#object.Object.blob_download) 74 * [multiobj.object\_group](#multiobj.object_group) 75 * [ObjectGroup](#multiobj.object_group.ObjectGroup) 76 * [list\_urls](#multiobj.object_group.ObjectGroup.list_urls) 77 * [list\_all\_objects\_iter](#multiobj.object_group.ObjectGroup.list_all_objects_iter) 78 * [delete](#multiobj.object_group.ObjectGroup.delete) 79 * [evict](#multiobj.object_group.ObjectGroup.evict) 80 * [prefetch](#multiobj.object_group.ObjectGroup.prefetch) 81 * [copy](#multiobj.object_group.ObjectGroup.copy) 82 * [transform](#multiobj.object_group.ObjectGroup.transform) 83 * [archive](#multiobj.object_group.ObjectGroup.archive) 84 * [list\_names](#multiobj.object_group.ObjectGroup.list_names) 85 * [multiobj.object\_names](#multiobj.object_names) 86 * [ObjectNames](#multiobj.object_names.ObjectNames) 87 * [multiobj.object\_range](#multiobj.object_range) 88 * [ObjectRange](#multiobj.object_range.ObjectRange) 89 * [multiobj.object\_template](#multiobj.object_template) 90 * [ObjectTemplate](#multiobj.object_template.ObjectTemplate) 91 * [job](#job) 92 * [Job](#job.Job) 93 * [job\_id](#job.Job.job_id) 94 * [job\_kind](#job.Job.job_kind) 95 * [status](#job.Job.status) 96 * [wait](#job.Job.wait) 97 * [wait\_for\_idle](#job.Job.wait_for_idle) 98 * [wait\_single\_node](#job.Job.wait_single_node) 99 * [start](#job.Job.start) 100 * [get\_within\_timeframe](#job.Job.get_within_timeframe) 101 * [object\_reader](#object_reader) 102 * [ObjectReader](#object_reader.ObjectReader) 103 * [attributes](#object_reader.ObjectReader.attributes) 104 * [read\_all](#object_reader.ObjectReader.read_all) 105 * [raw](#object_reader.ObjectReader.raw) 106 * [\_\_iter\_\_](#object_reader.ObjectReader.__iter__) 107 * [object\_iterator](#object_iterator) 108 * [ObjectIterator](#object_iterator.ObjectIterator) 109 * [etl](#etl) 110 * [Etl](#etl.Etl) 111 * [name](#etl.Etl.name) 112 * [init\_spec](#etl.Etl.init_spec) 113 * [init\_code](#etl.Etl.init_code) 114 * [view](#etl.Etl.view) 115 * [start](#etl.Etl.start) 116 * [stop](#etl.Etl.stop) 117 * [delete](#etl.Etl.delete) 118 119 <a id="client.Client"></a> 120 121 ## Class: Client 122 123 ```python 124 class Client() 125 ``` 126 127 AIStore client for managing buckets, objects, ETL jobs 128 129 **Arguments**: 130 131 - `endpoint` _str_ - AIStore endpoint 132 133 <a id="client.Client.bucket"></a> 134 135 ### bucket 136 137 ```python 138 def bucket(bck_name: str, 139 provider: str = PROVIDER_AIS, 140 namespace: Namespace = None) 141 ``` 142 143 Factory constructor for bucket object. 144 Does not make any HTTP request, only instantiates a bucket object. 145 146 **Arguments**: 147 148 - `bck_name` _str_ - Name of bucket 149 - `provider` _str_ - Provider of bucket, one of "ais", "aws", "gcp", ... (optional, defaults to ais) 150 - `namespace` _Namespace_ - Namespace of bucket (optional, defaults to None) 151 152 153 **Returns**: 154 155 The bucket object created. 156 157 <a id="client.Client.cluster"></a> 158 159 ### cluster 160 161 ```python 162 def cluster() 163 ``` 164 165 Factory constructor for cluster object. 166 Does not make any HTTP request, only instantiates a cluster object. 167 168 **Returns**: 169 170 The cluster object created. 171 172 <a id="client.Client.job"></a> 173 174 ### job 175 176 ```python 177 def job(job_id: str = "", job_kind: str = "") 178 ``` 179 180 Factory constructor for job object, which contains job-related functions. 181 Does not make any HTTP request, only instantiates a job object. 182 183 **Arguments**: 184 185 - `job_id` _str, optional_ - Optional ID for interacting with a specific job 186 - `job_kind` _str, optional_ - Optional specific type of job empty for all kinds 187 188 189 **Returns**: 190 191 The job object created. 192 193 <a id="client.Client.etl"></a> 194 195 ### etl 196 197 ```python 198 def etl(etl_name: str) 199 ``` 200 201 Factory constructor for ETL object. 202 Contains APIs related to AIStore ETL operations. 203 Does not make any HTTP request, only instantiates an ETL object. 204 205 **Arguments**: 206 207 - `etl_name` _str_ - Name of the ETL 208 209 210 **Returns**: 211 212 The ETL object created. 213 214 <a id="client.Client.dsort"></a> 215 216 ### dsort 217 218 ```python 219 def dsort(dsort_id: str = "") 220 ``` 221 222 Factory constructor for dSort object. 223 Contains APIs related to AIStore dSort operations. 224 Does not make any HTTP request, only instantiates a dSort object. 225 226 **Arguments**: 227 228 - `dsort_id` - ID of the dSort job 229 230 231 **Returns**: 232 233 dSort object created 234 235 <a id="cluster.Cluster"></a> 236 237 ## Class: Cluster 238 239 ```python 240 class Cluster() 241 ``` 242 243 A class representing a cluster bound to an AIS client. 244 245 <a id="cluster.Cluster.client"></a> 246 247 ### client 248 249 ```python 250 @property 251 def client() 252 ``` 253 254 Client this cluster uses to make requests 255 256 <a id="cluster.Cluster.get_info"></a> 257 258 ### get\_info 259 260 ```python 261 def get_info() -> Smap 262 ``` 263 264 Returns state of AIS cluster, including the detailed information about its nodes. 265 266 **Returns**: 267 268 - `aistore.sdk.types.Smap` - Smap containing cluster information 269 270 271 **Raises**: 272 273 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 274 - `requests.ConnectionError` - Connection error 275 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 276 - `requests.ReadTimeout` - Timed out waiting response from AIStore 277 278 <a id="cluster.Cluster.get_primary_url"></a> 279 280 ### get\_primary\_url 281 282 ```python 283 def get_primary_url() -> str 284 ``` 285 286 Returns: URL of primary proxy 287 288 <a id="cluster.Cluster.list_buckets"></a> 289 290 ### list\_buckets 291 292 ```python 293 def list_buckets(provider: str = PROVIDER_AIS) 294 ``` 295 296 Returns list of buckets in AIStore cluster. 297 298 **Arguments**: 299 300 - `provider` _str, optional_ - Name of bucket provider, one of "ais", "aws", "gcp", "az" or "ht". 301 Defaults to "ais". Empty provider returns buckets of all providers. 302 303 304 **Returns**: 305 306 - `List[BucketModel]` - A list of buckets 307 308 309 **Raises**: 310 311 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 312 - `requests.ConnectionError` - Connection error 313 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 314 - `requests.ReadTimeout` - Timed out waiting response from AIStore 315 316 <a id="cluster.Cluster.list_jobs_status"></a> 317 318 ### list\_jobs\_status 319 320 ```python 321 def list_jobs_status(job_kind="", target_id="") -> List[JobStatus] 322 ``` 323 324 List the status of jobs on the cluster 325 326 **Arguments**: 327 328 - `job_kind` _str, optional_ - Only show jobs of a particular type 329 - `target_id` _str, optional_ - Limit to jobs on a specific target node 330 331 332 **Returns**: 333 334 List of JobStatus objects 335 336 <a id="cluster.Cluster.list_running_jobs"></a> 337 338 ### list\_running\_jobs 339 340 ```python 341 def list_running_jobs(job_kind="", target_id="") -> List[str] 342 ``` 343 344 List the currently running jobs on the cluster 345 346 **Arguments**: 347 348 - `job_kind` _str, optional_ - Only show jobs of a particular type 349 - `target_id` _str, optional_ - Limit to jobs on a specific target node 350 351 352 **Returns**: 353 354 List of jobs in the format job_kind[job_id] 355 356 <a id="cluster.Cluster.list_running_etls"></a> 357 358 ### list\_running\_etls 359 360 ```python 361 def list_running_etls() -> List[ETLInfo] 362 ``` 363 364 Lists all running ETLs. 365 366 Note: Does not list ETLs that have been stopped or deleted. 367 368 **Returns**: 369 370 - `List[ETLInfo]` - A list of details on running ETLs 371 372 <a id="cluster.Cluster.is_ready"></a> 373 374 ### is\_ready 375 376 ```python 377 def is_ready() -> bool 378 ``` 379 380 Checks if cluster is ready or still setting up. 381 382 **Returns**: 383 384 - `bool` - True if cluster is ready, or false if cluster is still setting up 385 386 <a id="cluster.Cluster.get_performance"></a> 387 388 ### get\_performance 389 390 ```python 391 def get_performance(get_throughput: bool = True, 392 get_latency: bool = True, 393 get_counters: bool = True) -> ClusterPerformance 394 ``` 395 396 Retrieves and calculates the performance metrics for each target node in the AIStore cluster. 397 It compiles throughput, latency, and various operational counters from each target node, 398 providing a comprehensive view of the cluster's overall performance 399 400 **Arguments**: 401 402 - `get_throughput` _bool, optional_ - get cluster throughput 403 - `get_latency` _bool, optional_ - get cluster latency 404 - `get_counters` _bool, optional_ - get cluster counters 405 406 407 **Returns**: 408 409 - `ClusterPerformance` - An object encapsulating the detailed performance metrics of the cluster, 410 including throughput, latency, and counters for each node 411 412 413 **Raises**: 414 415 - `requests.RequestException` - If there's an ambiguous exception while processing the request 416 - `requests.ConnectionError` - If there's a connection error with the cluster 417 - `requests.ConnectionTimeout` - If the connection to the cluster times out 418 - `requests.ReadTimeout` - If the timeout is reached while awaiting a response from the cluster 419 420 <a id="bucket.Bucket"></a> 421 422 ## Class: Bucket 423 424 ```python 425 class Bucket(AISSource) 426 ``` 427 428 A class representing a bucket that contains user data. 429 430 **Arguments**: 431 432 - `client` _RequestClient_ - Client for interfacing with AIS cluster 433 - `name` _str_ - name of bucket 434 - `provider` _str, optional_ - Provider of bucket (one of "ais", "aws", "gcp", ...), defaults to "ais" 435 - `namespace` _Namespace, optional_ - Namespace of bucket, defaults to None 436 437 <a id="bucket.Bucket.client"></a> 438 439 ### client 440 441 ```python 442 @property 443 def client() -> RequestClient 444 ``` 445 446 The client bound to this bucket. 447 448 <a id="bucket.Bucket.qparam"></a> 449 450 ### qparam 451 452 ```python 453 @property 454 def qparam() -> Dict 455 ``` 456 457 Default query parameters to use with API calls from this bucket. 458 459 <a id="bucket.Bucket.provider"></a> 460 461 ### provider 462 463 ```python 464 @property 465 def provider() -> str 466 ``` 467 468 The provider for this bucket. 469 470 <a id="bucket.Bucket.name"></a> 471 472 ### name 473 474 ```python 475 @property 476 def name() -> str 477 ``` 478 479 The name of this bucket. 480 481 <a id="bucket.Bucket.namespace"></a> 482 483 ### namespace 484 485 ```python 486 @property 487 def namespace() -> Namespace 488 ``` 489 490 The namespace for this bucket. 491 492 <a id="bucket.Bucket.list_urls"></a> 493 494 ### list\_urls 495 496 ```python 497 def list_urls(prefix: str = "", etl_name: str = None) -> Iterable[str] 498 ``` 499 500 Implementation of the abstract method from AISSource that provides an iterator 501 of full URLs to every object in this bucket matching the specified prefix 502 503 **Arguments**: 504 505 - `prefix` _str, optional_ - Limit objects selected by a given string prefix 506 - `etl_name` _str, optional_ - ETL to include in URLs 507 508 509 **Returns**: 510 511 Iterator of full URLs of all objects matching the prefix 512 513 <a id="bucket.Bucket.list_all_objects_iter"></a> 514 515 ### list\_all\_objects\_iter 516 517 ```python 518 def list_all_objects_iter(prefix: str = "") -> Iterable[Object] 519 ``` 520 521 Implementation of the abstract method from AISSource that provides an iterator 522 of all the objects in this bucket matching the specified prefix 523 524 **Arguments**: 525 526 - `prefix` _str, optional_ - Limit objects selected by a given string prefix 527 528 529 **Returns**: 530 531 Iterator of all object URLs matching the prefix 532 533 <a id="bucket.Bucket.create"></a> 534 535 ### create 536 537 ```python 538 def create(exist_ok=False) 539 ``` 540 541 Creates a bucket in AIStore cluster. 542 Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation. 543 544 **Arguments**: 545 546 - `exist_ok` _bool, optional_ - Ignore error if the cluster already contains this bucket 547 548 549 **Raises**: 550 551 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 552 - `aistore.sdk.errors.InvalidBckProvider` - Invalid bucket provider for requested operation 553 - `requests.ConnectionError` - Connection error 554 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 555 - `requests.exceptions.HTTPError` - Service unavailable 556 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 557 - `requests.ReadTimeout` - Timed out receiving response from AIStore 558 559 <a id="bucket.Bucket.delete"></a> 560 561 ### delete 562 563 ```python 564 def delete(missing_ok=False) 565 ``` 566 567 Destroys bucket in AIStore cluster. 568 In all cases removes both the bucket's content _and_ the bucket's metadata from the cluster. 569 Note: AIS will _not_ call the remote backend provider to delete the corresponding Cloud bucket 570 (iff the bucket in question is, in fact, a Cloud bucket). 571 572 **Arguments**: 573 574 - `missing_ok` _bool, optional_ - Ignore error if bucket does not exist 575 576 577 **Raises**: 578 579 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 580 - `aistore.sdk.errors.InvalidBckProvider` - Invalid bucket provider for requested operation 581 - `requests.ConnectionError` - Connection error 582 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 583 - `requests.exceptions.HTTPError` - Service unavailable 584 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 585 - `requests.ReadTimeout` - Timed out receiving response from AIStore 586 587 <a id="bucket.Bucket.rename"></a> 588 589 ### rename 590 591 ```python 592 def rename(to_bck_name: str) -> str 593 ``` 594 595 Renames bucket in AIStore cluster. 596 Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous 597 operation. 598 599 **Arguments**: 600 601 - `to_bck_name` _str_ - New bucket name for bucket to be renamed as 602 603 604 **Returns**: 605 606 Job ID (as str) that can be used to check the status of the operation 607 608 609 **Raises**: 610 611 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 612 - `aistore.sdk.errors.InvalidBckProvider` - Invalid bucket provider for requested operation 613 - `requests.ConnectionError` - Connection error 614 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 615 - `requests.exceptions.HTTPError` - Service unavailable 616 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 617 - `requests.ReadTimeout` - Timed out receiving response from AIStore 618 619 <a id="bucket.Bucket.evict"></a> 620 621 ### evict 622 623 ```python 624 def evict(keep_md: bool = False) 625 ``` 626 627 Evicts bucket in AIStore cluster. 628 NOTE: only Cloud buckets can be evicted. 629 630 **Arguments**: 631 632 - `keep_md` _bool, optional_ - If true, evicts objects but keeps the bucket's metadata (i.e., the bucket's name 633 and its properties) 634 635 636 **Raises**: 637 638 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 639 - `aistore.sdk.errors.InvalidBckProvider` - Invalid bucket provider for requested operation 640 - `requests.ConnectionError` - Connection error 641 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 642 - `requests.exceptions.HTTPError` - Service unavailable 643 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 644 - `requests.ReadTimeout` - Timed out receiving response from AIStore 645 646 <a id="bucket.Bucket.head"></a> 647 648 ### head 649 650 ```python 651 def head() -> Header 652 ``` 653 654 Requests bucket properties. 655 656 **Returns**: 657 658 Response header with the bucket properties 659 660 661 **Raises**: 662 663 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 664 - `requests.ConnectionError` - Connection error 665 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 666 - `requests.exceptions.HTTPError` - Service unavailable 667 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 668 - `requests.ReadTimeout` - Timed out receiving response from AIStore 669 670 <a id="bucket.Bucket.summary"></a> 671 672 ### summary 673 674 ```python 675 def summary(uuid: str = "", 676 prefix: str = "", 677 cached: bool = True, 678 present: bool = True) 679 ``` 680 681 Returns bucket summary (starts xaction job and polls for results). 682 683 **Arguments**: 684 685 - `uuid` _str_ - Identifier for the bucket summary. Defaults to an empty string. 686 - `prefix` _str_ - Prefix for objects to be included in the bucket summary. 687 Defaults to an empty string (all objects). 688 - `cached` _bool_ - If True, summary entails cached entities. Defaults to True. 689 - `present` _bool_ - If True, summary entails present entities. Defaults to True. 690 691 692 **Raises**: 693 694 - `requests.ConnectionError` - Connection error 695 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 696 - `requests.exceptions.HTTPError` - Service unavailable 697 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 698 - `requests.ReadTimeout` - Timed out receiving response from AIStore 699 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 700 701 <a id="bucket.Bucket.info"></a> 702 703 ### info 704 705 ```python 706 def info(flt_presence: int = 0, bsumm_remote: bool = True) 707 ``` 708 709 Returns bucket summary and information/properties. 710 711 **Arguments**: 712 713 - `flt_presence` _int_ - Describes the presence of buckets and objects with respect to their existence 714 or non-existence in the AIS cluster. Defaults to 0. 715 716 Expected values are: 717 0 - (object | bucket) exists inside and/or outside cluster 718 1 - same as 0 but no need to return summary 719 2 - bucket: is present | object: present and properly located 720 3 - same as 2 but no need to return summary 721 4 - objects: present anywhere/anyhow _in_ the cluster as: replica, ec-slices, misplaced 722 5 - not present - exists _outside_ cluster 723 - `bsumm_remote` _bool_ - If True, returned bucket info will include remote objects as well 724 725 726 **Raises**: 727 728 - `requests.ConnectionError` - Connection error 729 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 730 - `requests.exceptions.HTTPError` - Service unavailable 731 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 732 - `requests.ReadTimeout` - Timed out receiving response from AIStore 733 - `ValueError` - `flt_presence` is not one of the expected values 734 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 735 736 <a id="bucket.Bucket.copy"></a> 737 738 ### copy 739 740 ```python 741 def copy(to_bck: Bucket, 742 prefix_filter: str = "", 743 prepend: str = "", 744 dry_run: bool = False, 745 force: bool = False, 746 latest: bool = False, 747 sync: bool = False) -> str 748 ``` 749 750 Returns job ID that can be used later to check the status of the asynchronous operation. 751 752 **Arguments**: 753 754 - `to_bck` _Bucket_ - Destination bucket 755 - `prefix_filter` _str, optional_ - Only copy objects with names starting with this prefix 756 - `prepend` _str, optional_ - Value to prepend to the name of copied objects 757 - `dry_run` _bool, optional_ - Determines if the copy should actually 758 happen or not 759 - `force` _bool, optional_ - Override existing destination bucket 760 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 761 - `sync` _bool, optional_ - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source 762 763 764 **Returns**: 765 766 Job ID (as str) that can be used to check the status of the operation 767 768 769 **Raises**: 770 771 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 772 - `requests.ConnectionError` - Connection error 773 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 774 - `requests.exceptions.HTTPError` - Service unavailable 775 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 776 - `requests.ReadTimeout` - Timed out receiving response from AIStore 777 778 <a id="bucket.Bucket.list_objects"></a> 779 780 ### list\_objects 781 782 ```python 783 def list_objects(prefix: str = "", 784 props: str = "", 785 page_size: int = 0, 786 uuid: str = "", 787 continuation_token: str = "", 788 flags: List[ListObjectFlag] = None, 789 target: str = "") -> BucketList 790 ``` 791 792 Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if 793 available). 794 795 **Arguments**: 796 797 - `prefix` _str, optional_ - Return only objects that start with the prefix 798 - `props` _str, optional_ - Comma-separated list of object properties to return. Default value is "name,size". 799 - `Properties` - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", 800 "ec", "custom", "node". 801 - `page_size` _int, optional_ - Return at most "page_size" objects. 802 The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return 803 more than 5,000 objects in a single page. 804 - `NOTE` - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. 805 Defaults to "0" - return maximum number of objects. 806 - `uuid` _str, optional_ - Job ID, required to get the next page of objects 807 - `continuation_token` _str, optional_ - Marks the object to start reading the next page 808 - `flags` _List[ListObjectFlag], optional_ - Optional list of ListObjectFlag enums to include as flags in the 809 request 810 target(str, optional): Only list objects on this specific target node 811 812 813 **Returns**: 814 815 - `BucketList` - the page of objects in the bucket and the continuation token to get the next page 816 Empty continuation token marks the final page of the object list 817 818 819 **Raises**: 820 821 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 822 - `requests.ConnectionError` - Connection error 823 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 824 - `requests.exceptions.HTTPError` - Service unavailable 825 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 826 - `requests.ReadTimeout` - Timed out receiving response from AIStore 827 828 <a id="bucket.Bucket.list_objects_iter"></a> 829 830 ### list\_objects\_iter 831 832 ```python 833 def list_objects_iter(prefix: str = "", 834 props: str = "", 835 page_size: int = 0, 836 flags: List[ListObjectFlag] = None, 837 target: str = "") -> ObjectIterator 838 ``` 839 840 Returns an iterator for all objects in bucket 841 842 **Arguments**: 843 844 - `prefix` _str, optional_ - Return only objects that start with the prefix 845 - `props` _str, optional_ - Comma-separated list of object properties to return. Default value is "name,size". 846 - `Properties` - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", 847 "ec", "custom", "node". 848 - `page_size` _int, optional_ - return at most "page_size" objects 849 The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return 850 more than 5,000 objects in a single page. 851 - `NOTE` - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. 852 Defaults to "0" - return maximum number objects 853 - `flags` _List[ListObjectFlag], optional_ - Optional list of ListObjectFlag enums to include as flags in the 854 request 855 target(str, optional): Only list objects on this specific target node 856 857 858 **Returns**: 859 860 - `ObjectIterator` - object iterator 861 862 863 **Raises**: 864 865 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 866 - `requests.ConnectionError` - Connection error 867 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 868 - `requests.exceptions.HTTPError` - Service unavailable 869 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 870 - `requests.ReadTimeout` - Timed out receiving response from AIStore 871 872 <a id="bucket.Bucket.list_all_objects"></a> 873 874 ### list\_all\_objects 875 876 ```python 877 def list_all_objects(prefix: str = "", 878 props: str = "", 879 page_size: int = 0, 880 flags: List[ListObjectFlag] = None, 881 target: str = "") -> List[BucketEntry] 882 ``` 883 884 Returns a list of all objects in bucket 885 886 **Arguments**: 887 888 - `prefix` _str, optional_ - return only objects that start with the prefix 889 - `props` _str, optional_ - comma-separated list of object properties to return. Default value is "name,size". 890 - `Properties` - "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", 891 "ec", "custom", "node". 892 - `page_size` _int, optional_ - return at most "page_size" objects 893 The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return 894 more than 5,000 objects in a single page. 895 - `NOTE` - If "page_size" is greater than a backend maximum, the backend maximum objects are returned. 896 Defaults to "0" - return maximum number objects 897 - `flags` _List[ListObjectFlag], optional_ - Optional list of ListObjectFlag enums to include as flags in the 898 request 899 target(str, optional): Only list objects on this specific target node 900 901 902 **Returns**: 903 904 - `List[BucketEntry]` - list of objects in bucket 905 906 907 **Raises**: 908 909 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 910 - `requests.ConnectionError` - Connection error 911 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 912 - `requests.exceptions.HTTPError` - Service unavailable 913 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 914 - `requests.ReadTimeout` - Timed out receiving response from AIStore 915 916 <a id="bucket.Bucket.transform"></a> 917 918 ### transform 919 920 ```python 921 def transform(etl_name: str, 922 to_bck: Bucket, 923 timeout: str = DEFAULT_ETL_TIMEOUT, 924 prefix_filter: str = "", 925 prepend: str = "", 926 ext: Dict[str, str] = None, 927 force: bool = False, 928 dry_run: bool = False, 929 latest: bool = False, 930 sync: bool = False) -> str 931 ``` 932 933 Visits all selected objects in the source bucket and for each object, puts the transformed 934 result to the destination bucket 935 936 **Arguments**: 937 938 - `etl_name` _str_ - name of etl to be used for transformations 939 - `to_bck` _str_ - destination bucket for transformations 940 - `timeout` _str, optional_ - Timeout of the ETL job (e.g. 5m for 5 minutes) 941 - `prefix_filter` _str, optional_ - Only transform objects with names starting with this prefix 942 - `prepend` _str, optional_ - Value to prepend to the name of resulting transformed objects 943 - `ext` _Dict[str, str], optional_ - dict of new extension followed by extension to be replaced 944 (i.e. {"jpg": "txt"}) 945 - `dry_run` _bool, optional_ - determines if the copy should actually happen or not 946 - `force` _bool, optional_ - override existing destination bucket 947 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 948 - `sync` _bool, optional_ - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source 949 950 951 **Returns**: 952 953 Job ID (as str) that can be used to check the status of the operation 954 955 <a id="bucket.Bucket.put_files"></a> 956 957 ### put\_files 958 959 ```python 960 def put_files(path: str, 961 prefix_filter: str = "", 962 pattern: str = "*", 963 basename: bool = False, 964 prepend: str = None, 965 recursive: bool = False, 966 dry_run: bool = False, 967 verbose: bool = True) -> List[str] 968 ``` 969 970 Puts files found in a given filepath as objects to a bucket in AIS storage. 971 972 **Arguments**: 973 974 - `path` _str_ - Local filepath, can be relative or absolute 975 - `prefix_filter` _str, optional_ - Only put files with names starting with this prefix 976 - `pattern` _str, optional_ - Regex pattern to filter files 977 - `basename` _bool, optional_ - Whether to use the file names only as object names and omit the path information 978 - `prepend` _str, optional_ - Optional string to use as a prefix in the object name for all objects uploaded 979 No delimiter ("/", "-", etc.) is automatically applied between the prepend value and the object name 980 - `recursive` _bool, optional_ - Whether to recurse through the provided path directories 981 - `dry_run` _bool, optional_ - Option to only show expected behavior without an actual put operation 982 - `verbose` _bool, optional_ - Whether to print upload info to standard output 983 984 985 **Returns**: 986 987 List of object names put to a bucket in AIS 988 989 990 **Raises**: 991 992 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 993 - `requests.ConnectionError` - Connection error 994 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 995 - `requests.ReadTimeout` - Timed out waiting response from AIStore 996 - `ValueError` - The path provided is not a valid directory 997 998 <a id="bucket.Bucket.object"></a> 999 1000 ### object 1001 1002 ```python 1003 def object(obj_name: str) -> Object 1004 ``` 1005 1006 Factory constructor for an object in this bucket. 1007 Does not make any HTTP request, only instantiates an object in a bucket owned by the client. 1008 1009 **Arguments**: 1010 1011 - `obj_name` _str_ - Name of object 1012 1013 1014 **Returns**: 1015 1016 The object created. 1017 1018 <a id="bucket.Bucket.objects"></a> 1019 1020 ### objects 1021 1022 ```python 1023 def objects(obj_names: list = None, 1024 obj_range: ObjectRange = None, 1025 obj_template: str = None) -> ObjectGroup 1026 ``` 1027 1028 Factory constructor for multiple objects belonging to this bucket. 1029 1030 **Arguments**: 1031 1032 - `obj_names` _list_ - Names of objects to include in the group 1033 - `obj_range` _ObjectRange_ - Range of objects to include in the group 1034 - `obj_template` _str_ - String template defining objects to include in the group 1035 1036 1037 **Returns**: 1038 1039 The ObjectGroup created 1040 1041 <a id="bucket.Bucket.make_request"></a> 1042 1043 ### make\_request 1044 1045 ```python 1046 def make_request(method: str, 1047 action: str, 1048 value: dict = None, 1049 params: dict = None) -> requests.Response 1050 ``` 1051 1052 Use the bucket's client to make a request to the bucket endpoint on the AIS server 1053 1054 **Arguments**: 1055 1056 - `method` _str_ - HTTP method to use, e.g. POST/GET/DELETE 1057 - `action` _str_ - Action string used to create an ActionMsg to pass to the server 1058 - `value` _dict_ - Additional value parameter to pass in the ActionMsg 1059 - `params` _dict, optional_ - Optional parameters to pass in the request 1060 1061 1062 **Returns**: 1063 1064 Response from the server 1065 1066 <a id="bucket.Bucket.verify_cloud_bucket"></a> 1067 1068 ### verify\_cloud\_bucket 1069 1070 ```python 1071 def verify_cloud_bucket() 1072 ``` 1073 1074 Verify the bucket provider is a cloud provider 1075 1076 <a id="bucket.Bucket.get_path"></a> 1077 1078 ### get\_path 1079 1080 ```python 1081 def get_path() -> str 1082 ``` 1083 1084 Get the path representation of this bucket 1085 1086 <a id="bucket.Bucket.as_model"></a> 1087 1088 ### as\_model 1089 1090 ```python 1091 def as_model() -> BucketModel 1092 ``` 1093 1094 Return a data-model of the bucket 1095 1096 **Returns**: 1097 1098 BucketModel representation 1099 1100 <a id="bucket.Bucket.write_dataset"></a> 1101 1102 ### write\_dataset 1103 1104 ```python 1105 def write_dataset(config: DatasetConfig, **kwargs) 1106 ``` 1107 1108 Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter 1109 1110 **Arguments**: 1111 1112 - `config` _DatasetConfig_ - Configuration dict specifying how to process 1113 and store each part of the dataset item 1114 - `**kwargs` _optional_ - Optional keyword arguments to pass to the ShardWriter 1115 1116 <a id="object.Object"></a> 1117 1118 ## Class: Object 1119 1120 ```python 1121 class Object() 1122 ``` 1123 1124 A class representing an object of a bucket bound to a client. 1125 1126 **Arguments**: 1127 1128 - `bucket` _Bucket_ - Bucket to which this object belongs 1129 - `name` _str_ - name of object 1130 1131 <a id="object.Object.bucket"></a> 1132 1133 ### bucket 1134 1135 ```python 1136 @property 1137 def bucket() 1138 ``` 1139 1140 Bucket containing this object 1141 1142 <a id="object.Object.name"></a> 1143 1144 ### name 1145 1146 ```python 1147 @property 1148 def name() 1149 ``` 1150 1151 Name of this object 1152 1153 <a id="object.Object.head"></a> 1154 1155 ### head 1156 1157 ```python 1158 def head() -> Header 1159 ``` 1160 1161 Requests object properties. 1162 1163 **Returns**: 1164 1165 Response header with the object properties. 1166 1167 1168 **Raises**: 1169 1170 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1171 - `requests.ConnectionError` - Connection error 1172 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1173 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1174 - `requests.exceptions.HTTPError(404)` - The object does not exist 1175 1176 <a id="object.Object.get"></a> 1177 1178 ### get 1179 1180 ```python 1181 def get(archpath: str = "", 1182 chunk_size: int = DEFAULT_CHUNK_SIZE, 1183 etl_name: str = None, 1184 writer: BufferedWriter = None, 1185 latest: bool = False, 1186 byte_range: str = None, 1187 blob_chunk_size: str = None, 1188 blob_num_workers: str = None) -> ObjectReader 1189 ``` 1190 1191 Reads an object 1192 1193 **Arguments**: 1194 1195 - `archpath` _str, optional_ - If the object is an archive, use `archpath` to extract a single file 1196 from the archive 1197 - `chunk_size` _int, optional_ - chunk_size to use while reading from stream 1198 - `etl_name` _str, optional_ - Transforms an object based on ETL with etl_name 1199 - `writer` _BufferedWriter, optional_ - User-provided writer for writing content output 1200 User is responsible for closing the writer 1201 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 1202 - `byte_range` _str, optional_ - Specify a specific data segment of the object for transfer, including 1203 both the start and end of the range (e.g. "bytes=0-499" to request the first 500 bytes) 1204 - `blob_chunk_size` _str, optional_ - Utilize built-in blob-downloader with the given chunk size in 1205 IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k;) 1206 - `blob_num_workers` _str, optional_ - Utilize built-in blob-downloader with the given number of 1207 concurrent blob-downloading workers (readers) 1208 1209 1210 **Returns**: 1211 1212 The stream of bytes to read an object or a file inside an archive. 1213 1214 1215 **Raises**: 1216 1217 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1218 - `requests.ConnectionError` - Connection error 1219 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1220 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1221 1222 <a id="object.Object.get_semantic_url"></a> 1223 1224 ### get\_semantic\_url 1225 1226 ```python 1227 def get_semantic_url() 1228 ``` 1229 1230 Get the semantic URL to the object 1231 1232 **Returns**: 1233 1234 Semantic URL to get object 1235 1236 <a id="object.Object.get_url"></a> 1237 1238 ### get\_url 1239 1240 ```python 1241 def get_url(archpath: str = "", etl_name: str = None) 1242 ``` 1243 1244 Get the full url to the object including base url and any query parameters 1245 1246 **Arguments**: 1247 1248 - `archpath` _str, optional_ - If the object is an archive, use `archpath` to extract a single file 1249 from the archive 1250 - `etl_name` _str, optional_ - Transforms an object based on ETL with etl_name 1251 1252 1253 **Returns**: 1254 1255 Full URL to get object 1256 1257 <a id="object.Object.put_content"></a> 1258 1259 ### put\_content 1260 1261 ```python 1262 def put_content(content: bytes) -> Header 1263 ``` 1264 1265 Puts bytes as an object to a bucket in AIS storage. 1266 1267 **Arguments**: 1268 1269 - `content` _bytes_ - Bytes to put as an object. 1270 1271 1272 **Raises**: 1273 1274 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1275 - `requests.ConnectionError` - Connection error 1276 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1277 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1278 1279 <a id="object.Object.put_file"></a> 1280 1281 ### put\_file 1282 1283 ```python 1284 def put_file(path: str = None) 1285 ``` 1286 1287 Puts a local file as an object to a bucket in AIS storage. 1288 1289 **Arguments**: 1290 1291 - `path` _str_ - Path to local file 1292 1293 1294 **Raises**: 1295 1296 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1297 - `requests.ConnectionError` - Connection error 1298 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1299 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1300 - `ValueError` - The path provided is not a valid file 1301 1302 <a id="object.Object.promote"></a> 1303 1304 ### promote 1305 1306 ```python 1307 def promote(path: str, 1308 target_id: str = "", 1309 recursive: bool = False, 1310 overwrite_dest: bool = False, 1311 delete_source: bool = False, 1312 src_not_file_share: bool = False) -> Header 1313 ``` 1314 1315 Promotes a file or folder an AIS target can access to a bucket in AIS storage. 1316 These files can be either on the physical disk of an AIS target itself or on a network file system 1317 the cluster can access. 1318 See more info here: https://aiatscale.org/blog/2022/03/17/promote 1319 1320 **Arguments**: 1321 1322 - `path` _str_ - Path to file or folder the AIS cluster can reach 1323 - `target_id` _str, optional_ - Promote files from a specific target node 1324 - `recursive` _bool, optional_ - Recursively promote objects from files in directories inside the path 1325 - `overwrite_dest` _bool, optional_ - Overwrite objects already on AIS 1326 - `delete_source` _bool, optional_ - Delete the source files when done promoting 1327 - `src_not_file_share` _bool, optional_ - Optimize if the source is guaranteed to not be on a file share 1328 1329 1330 **Returns**: 1331 1332 Object properties 1333 1334 1335 **Raises**: 1336 1337 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1338 - `requests.ConnectionError` - Connection error 1339 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1340 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1341 - `AISError` - Path does not exist on the AIS cluster storage 1342 1343 <a id="object.Object.delete"></a> 1344 1345 ### delete 1346 1347 ```python 1348 def delete() 1349 ``` 1350 1351 Delete an object from a bucket. 1352 1353 **Returns**: 1354 1355 None 1356 1357 1358 **Raises**: 1359 1360 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1361 - `requests.ConnectionError` - Connection error 1362 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1363 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1364 - `requests.exceptions.HTTPError(404)` - The object does not exist 1365 1366 <a id="object.Object.blob_download"></a> 1367 1368 ### blob\_download 1369 1370 ```python 1371 def blob_download(chunk_size: int = None, 1372 num_workers: int = None, 1373 latest: bool = False) -> str 1374 ``` 1375 1376 A special facility to download very large remote objects a.k.a. BLOBs 1377 Returns job ID that for the blob download operation. 1378 1379 **Arguments**: 1380 1381 - `chunk_size` _int_ - chunk size in bytes 1382 - `num_workers` _int_ - number of concurrent blob-downloading workers (readers) 1383 - `latest` _bool_ - GET the latest object version from the associated remote bucket 1384 1385 1386 **Returns**: 1387 1388 Job ID (as str) that can be used to check the status of the operation 1389 1390 1391 **Raises**: 1392 1393 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1394 - `requests.ConnectionError` - Connection error 1395 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1396 - `requests.exceptions.HTTPError` - Service unavailable 1397 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1398 1399 <a id="multiobj.object_group.ObjectGroup"></a> 1400 1401 ## Class: ObjectGroup 1402 1403 ```python 1404 class ObjectGroup(AISSource) 1405 ``` 1406 1407 A class representing multiple objects within the same bucket. Only one of obj_names, obj_range, or obj_template 1408 should be provided. 1409 1410 **Arguments**: 1411 1412 - `bck` _Bucket_ - Bucket the objects belong to 1413 - `obj_names` _list[str], optional_ - List of object names to include in this collection 1414 - `obj_range` _ObjectRange, optional_ - Range defining which object names in the bucket should be included 1415 - `obj_template` _str, optional_ - String argument to pass as template value directly to api 1416 1417 <a id="multiobj.object_group.ObjectGroup.list_urls"></a> 1418 1419 ### list\_urls 1420 1421 ```python 1422 def list_urls(prefix: str = "", etl_name: str = None) -> Iterable[str] 1423 ``` 1424 1425 Implementation of the abstract method from AISSource that provides an iterator 1426 of full URLs to every object in this bucket matching the specified prefix 1427 1428 **Arguments**: 1429 1430 - `prefix` _str, optional_ - Limit objects selected by a given string prefix 1431 - `etl_name` _str, optional_ - ETL to include in URLs 1432 1433 1434 **Returns**: 1435 1436 Iterator of all object URLs in the group 1437 1438 <a id="multiobj.object_group.ObjectGroup.list_all_objects_iter"></a> 1439 1440 ### list\_all\_objects\_iter 1441 1442 ```python 1443 def list_all_objects_iter(prefix: str = "") -> Iterable[Object] 1444 ``` 1445 1446 Implementation of the abstract method from AISSource that provides an iterator 1447 of all the objects in this bucket matching the specified prefix 1448 1449 **Arguments**: 1450 1451 - `prefix` _str, optional_ - Limit objects selected by a given string prefix 1452 1453 1454 **Returns**: 1455 1456 Iterator of all the objects in the group 1457 1458 <a id="multiobj.object_group.ObjectGroup.delete"></a> 1459 1460 ### delete 1461 1462 ```python 1463 def delete() 1464 ``` 1465 1466 Deletes a list or range of objects in a bucket 1467 1468 **Raises**: 1469 1470 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1471 - `requests.ConnectionError` - Connection error 1472 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1473 - `requests.exceptions.HTTPError` - Service unavailable 1474 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1475 - `requests.ReadTimeout` - Timed out receiving response from AIStore 1476 1477 1478 **Returns**: 1479 1480 Job ID (as str) that can be used to check the status of the operation 1481 1482 <a id="multiobj.object_group.ObjectGroup.evict"></a> 1483 1484 ### evict 1485 1486 ```python 1487 def evict() 1488 ``` 1489 1490 Evicts a list or range of objects in a bucket so that they are no longer cached in AIS 1491 NOTE: only Cloud buckets can be evicted. 1492 1493 **Raises**: 1494 1495 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1496 - `requests.ConnectionError` - Connection error 1497 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1498 - `requests.exceptions.HTTPError` - Service unavailable 1499 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1500 - `requests.ReadTimeout` - Timed out receiving response from AIStore 1501 1502 1503 **Returns**: 1504 1505 Job ID (as str) that can be used to check the status of the operation 1506 1507 <a id="multiobj.object_group.ObjectGroup.prefetch"></a> 1508 1509 ### prefetch 1510 1511 ```python 1512 def prefetch(blob_threshold: int = None, 1513 latest: bool = False, 1514 continue_on_error: bool = False) 1515 ``` 1516 1517 Prefetches a list or range of objects in a bucket so that they are cached in AIS 1518 NOTE: only Cloud buckets can be prefetched. 1519 1520 **Arguments**: 1521 1522 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 1523 - `continue_on_error` _bool, optional_ - Whether to continue if there is an error prefetching a single object 1524 - `blob_threshold` _int, optional_ - Utilize built-in blob-downloader for remote objects 1525 greater than the specified (threshold) size in bytes 1526 1527 1528 **Raises**: 1529 1530 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1531 - `requests.ConnectionError` - Connection error 1532 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1533 - `requests.exceptions.HTTPError` - Service unavailable 1534 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1535 - `requests.ReadTimeout` - Timed out receiving response from AIStore 1536 1537 1538 **Returns**: 1539 1540 Job ID (as str) that can be used to check the status of the operation 1541 1542 <a id="multiobj.object_group.ObjectGroup.copy"></a> 1543 1544 ### copy 1545 1546 ```python 1547 def copy(to_bck: "Bucket", 1548 prepend: str = "", 1549 continue_on_error: bool = False, 1550 dry_run: bool = False, 1551 force: bool = False, 1552 latest: bool = False, 1553 sync: bool = False) 1554 ``` 1555 1556 Copies a list or range of objects in a bucket 1557 1558 **Arguments**: 1559 1560 - `to_bck` _Bucket_ - Destination bucket 1561 - `prepend` _str, optional_ - Value to prepend to the name of copied objects 1562 - `continue_on_error` _bool, optional_ - Whether to continue if there is an error copying a single object 1563 - `dry_run` _bool, optional_ - Skip performing the copy and just log the intended actions 1564 - `force` _bool, optional_ - Force this job to run over others in case it conflicts 1565 (see "limited coexistence" and xact/xreg/xreg.go) 1566 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 1567 - `sync` _bool, optional_ - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source 1568 1569 1570 **Raises**: 1571 1572 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1573 - `requests.ConnectionError` - Connection error 1574 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1575 - `requests.exceptions.HTTPError` - Service unavailable 1576 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1577 - `requests.ReadTimeout` - Timed out receiving response from AIStore 1578 1579 1580 **Returns**: 1581 1582 Job ID (as str) that can be used to check the status of the operation 1583 1584 <a id="multiobj.object_group.ObjectGroup.transform"></a> 1585 1586 ### transform 1587 1588 ```python 1589 def transform(to_bck: "Bucket", 1590 etl_name: str, 1591 timeout: str = DEFAULT_ETL_TIMEOUT, 1592 prepend: str = "", 1593 continue_on_error: bool = False, 1594 dry_run: bool = False, 1595 force: bool = False, 1596 latest: bool = False, 1597 sync: bool = False) 1598 ``` 1599 1600 Performs ETL operation on a list or range of objects in a bucket, placing the results in the destination bucket 1601 1602 **Arguments**: 1603 1604 - `to_bck` _Bucket_ - Destination bucket 1605 - `etl_name` _str_ - Name of existing ETL to apply 1606 - `timeout` _str_ - Timeout of the ETL job (e.g. 5m for 5 minutes) 1607 - `prepend` _str, optional_ - Value to prepend to the name of resulting transformed objects 1608 - `continue_on_error` _bool, optional_ - Whether to continue if there is an error transforming a single object 1609 - `dry_run` _bool, optional_ - Skip performing the transform and just log the intended actions 1610 - `force` _bool, optional_ - Force this job to run over others in case it conflicts 1611 (see "limited coexistence" and xact/xreg/xreg.go) 1612 - `latest` _bool, optional_ - GET the latest object version from the associated remote bucket 1613 - `sync` _bool, optional_ - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source 1614 1615 1616 **Raises**: 1617 1618 - `aistore.sdk.errors.AISError` - All other types of errors with AIStore 1619 - `requests.ConnectionError` - Connection error 1620 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1621 - `requests.exceptions.HTTPError` - Service unavailable 1622 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1623 - `requests.ReadTimeout` - Timed out receiving response from AIStore 1624 1625 1626 **Returns**: 1627 1628 Job ID (as str) that can be used to check the status of the operation 1629 1630 <a id="multiobj.object_group.ObjectGroup.archive"></a> 1631 1632 ### archive 1633 1634 ```python 1635 def archive(archive_name: str, 1636 mime: str = "", 1637 to_bck: "Bucket" = None, 1638 include_source_name: bool = False, 1639 allow_append: bool = False, 1640 continue_on_err: bool = False) 1641 ``` 1642 1643 Create or append to an archive 1644 1645 **Arguments**: 1646 1647 - `archive_name` _str_ - Name of archive to create or append 1648 - `mime` _str, optional_ - MIME type of the content 1649 - `to_bck` _Bucket, optional_ - Destination bucket, defaults to current bucket 1650 - `include_source_name` _bool, optional_ - Include the source bucket name in the archived objects' names 1651 - `allow_append` _bool, optional_ - Allow appending to an existing archive 1652 - `continue_on_err` _bool, optional_ - Whether to continue if there is an error archiving a single object 1653 1654 1655 **Returns**: 1656 1657 Job ID (as str) that can be used to check the status of the operation 1658 1659 <a id="multiobj.object_group.ObjectGroup.list_names"></a> 1660 1661 ### list\_names 1662 1663 ```python 1664 def list_names() -> List[str] 1665 ``` 1666 1667 List all the object names included in this group of objects 1668 1669 **Returns**: 1670 1671 List of object names 1672 1673 <a id="multiobj.object_names.ObjectNames"></a> 1674 1675 ## Class: ObjectNames 1676 1677 ```python 1678 class ObjectNames(ObjectCollection) 1679 ``` 1680 1681 A collection of object names, provided as a list of strings 1682 1683 **Arguments**: 1684 1685 - `names` _List[str]_ - A list of object names 1686 1687 <a id="multiobj.object_range.ObjectRange"></a> 1688 1689 ## Class: ObjectRange 1690 1691 ```python 1692 class ObjectRange(ObjectCollection) 1693 ``` 1694 1695 Class representing a range of object names 1696 1697 **Arguments**: 1698 1699 - `prefix` _str_ - Prefix contained in all names of objects 1700 - `min_index` _int_ - Starting index in the name of objects 1701 - `max_index` _int_ - Last index in the name of all objects 1702 - `pad_width` _int, optional_ - Left-pad indices with zeros up to the width provided, e.g. pad_width = 3 will 1703 transform 1 to 001 1704 - `step` _int, optional_ - Size of iterator steps between each item 1705 - `suffix` _str, optional_ - Suffix at the end of all object names 1706 1707 <a id="multiobj.object_template.ObjectTemplate"></a> 1708 1709 ## Class: ObjectTemplate 1710 1711 ```python 1712 class ObjectTemplate(ObjectCollection) 1713 ``` 1714 1715 A collection of object names specified by a template in the bash brace expansion format 1716 1717 **Arguments**: 1718 1719 - `template` _str_ - A string template that defines the names of objects to include in the collection 1720 1721 <a id="job.Job"></a> 1722 1723 ## Class: Job 1724 1725 ```python 1726 class Job() 1727 ``` 1728 1729 A class containing job-related functions. 1730 1731 **Arguments**: 1732 1733 - `client` _RequestClient_ - Client for interfacing with AIS cluster 1734 - `job_id` _str, optional_ - ID of a specific job, empty for all jobs 1735 - `job_kind` _str, optional_ - Specific kind of job, empty for all kinds 1736 1737 <a id="job.Job.job_id"></a> 1738 1739 ### job\_id 1740 1741 ```python 1742 @property 1743 def job_id() 1744 ``` 1745 1746 Return job id 1747 1748 <a id="job.Job.job_kind"></a> 1749 1750 ### job\_kind 1751 1752 ```python 1753 @property 1754 def job_kind() 1755 ``` 1756 1757 Return job kind 1758 1759 <a id="job.Job.status"></a> 1760 1761 ### status 1762 1763 ```python 1764 def status() -> JobStatus 1765 ``` 1766 1767 Return status of a job 1768 1769 **Returns**: 1770 1771 The job status including id, finish time, and error info. 1772 1773 1774 **Raises**: 1775 1776 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1777 - `requests.ConnectionError` - Connection error 1778 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1779 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1780 1781 <a id="job.Job.wait"></a> 1782 1783 ### wait 1784 1785 ```python 1786 def wait(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, verbose: bool = True) 1787 ``` 1788 1789 Wait for a job to finish 1790 1791 **Arguments**: 1792 1793 - `timeout` _int, optional_ - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes. 1794 - `verbose` _bool, optional_ - Whether to log wait status to standard output 1795 1796 1797 **Returns**: 1798 1799 None 1800 1801 1802 **Raises**: 1803 1804 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1805 - `requests.ConnectionError` - Connection error 1806 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1807 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1808 - `errors.Timeout` - Timeout while waiting for the job to finish 1809 1810 <a id="job.Job.wait_for_idle"></a> 1811 1812 ### wait\_for\_idle 1813 1814 ```python 1815 def wait_for_idle(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, 1816 verbose: bool = True) 1817 ``` 1818 1819 Wait for a job to reach an idle state 1820 1821 **Arguments**: 1822 1823 - `timeout` _int, optional_ - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes. 1824 - `verbose` _bool, optional_ - Whether to log wait status to standard output 1825 1826 1827 **Returns**: 1828 1829 None 1830 1831 1832 **Raises**: 1833 1834 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1835 - `requests.ConnectionError` - Connection error 1836 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1837 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1838 - `errors.Timeout` - Timeout while waiting for the job to finish 1839 - `errors.JobInfoNotFound` - Raised when information on a job's status could not be found on the AIS cluster 1840 1841 <a id="job.Job.wait_single_node"></a> 1842 1843 ### wait\_single\_node 1844 1845 ```python 1846 def wait_single_node(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, 1847 verbose: bool = True) 1848 ``` 1849 1850 Wait for a job running on a single node 1851 1852 **Arguments**: 1853 1854 - `timeout` _int, optional_ - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes. 1855 - `verbose` _bool, optional_ - Whether to log wait status to standard output 1856 1857 1858 **Returns**: 1859 1860 None 1861 1862 1863 **Raises**: 1864 1865 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1866 - `requests.ConnectionError` - Connection error 1867 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1868 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1869 - `errors.Timeout` - Timeout while waiting for the job to finish 1870 - `errors.JobInfoNotFound` - Raised when information on a job's status could not be found on the AIS cluster 1871 1872 <a id="job.Job.start"></a> 1873 1874 ### start 1875 1876 ```python 1877 def start(daemon_id: str = "", 1878 force: bool = False, 1879 buckets: List[Bucket] = None) -> str 1880 ``` 1881 1882 Start a job and return its ID. 1883 1884 **Arguments**: 1885 1886 - `daemon_id` _str, optional_ - For running a job that must run on a specific target node (e.g. resilvering). 1887 - `force` _bool, optional_ - Override existing restrictions for a bucket (e.g., run LRU eviction even if the 1888 bucket has LRU disabled). 1889 - `buckets` _List[Bucket], optional_ - List of one or more buckets; applicable only for jobs that have bucket 1890 scope (for details on job types, see `Table` in xact/api.go). 1891 1892 1893 **Returns**: 1894 1895 The running job ID. 1896 1897 1898 **Raises**: 1899 1900 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1901 - `requests.ConnectionError` - Connection error 1902 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1903 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1904 1905 <a id="job.Job.get_within_timeframe"></a> 1906 1907 ### get\_within\_timeframe 1908 1909 ```python 1910 def get_within_timeframe(start_time: datetime.time, 1911 end_time: datetime.time) -> List[JobSnapshot] 1912 ``` 1913 1914 Checks for jobs that started and finished within a specified timeframe 1915 1916 **Arguments**: 1917 1918 - `start_time` _datetime.time_ - The start of the timeframe for monitoring jobs 1919 - `end_time` _datetime.time_ - The end of the timeframe for monitoring jobs 1920 1921 1922 **Returns**: 1923 1924 - `list` - A list of jobs that have finished within the specified timeframe 1925 1926 1927 **Raises**: 1928 1929 - `requests.RequestException` - "There was an ambiguous exception that occurred while handling..." 1930 - `requests.ConnectionError` - Connection error 1931 - `requests.ConnectionTimeout` - Timed out connecting to AIStore 1932 - `requests.ReadTimeout` - Timed out waiting response from AIStore 1933 - `errors.Timeout` - Timeout while waiting for the job to finish 1934 - `errors.JobInfoNotFound` - Raised when information on a job's status could not be found on the AIS cluster 1935 1936 <a id="object_reader.ObjectReader"></a> 1937 1938 ## Class: ObjectReader 1939 1940 ```python 1941 class ObjectReader() 1942 ``` 1943 1944 Represents the data returned by the API when getting an object, including access to the content stream and object 1945 attributes 1946 1947 <a id="object_reader.ObjectReader.attributes"></a> 1948 1949 ### attributes 1950 1951 ```python 1952 @property 1953 def attributes() -> ObjectAttributes 1954 ``` 1955 1956 Object metadata attributes 1957 1958 **Returns**: 1959 1960 Object attributes parsed from the headers returned by AIS 1961 1962 <a id="object_reader.ObjectReader.read_all"></a> 1963 1964 ### read\_all 1965 1966 ```python 1967 def read_all() -> bytes 1968 ``` 1969 1970 Read all byte data from the object content stream. 1971 This uses a bytes cast which makes it slightly slower and requires all object content to fit in memory at once 1972 1973 **Returns**: 1974 1975 Object content as bytes 1976 1977 <a id="object_reader.ObjectReader.raw"></a> 1978 1979 ### raw 1980 1981 ```python 1982 def raw() -> bytes 1983 ``` 1984 1985 Returns: Raw byte stream of object content 1986 1987 <a id="object_reader.ObjectReader.__iter__"></a> 1988 1989 ### \_\_iter\_\_ 1990 1991 ```python 1992 def __iter__() -> Iterator[bytes] 1993 ``` 1994 1995 Creates a generator to read the stream content in chunks 1996 1997 **Returns**: 1998 1999 An iterator with access to the next chunk of bytes 2000 2001 <a id="object_iterator.ObjectIterator"></a> 2002 2003 ## Class: ObjectIterator 2004 2005 ```python 2006 class ObjectIterator() 2007 ``` 2008 2009 Represents an iterable that will fetch all objects from a bucket, querying as needed with the specified function 2010 2011 **Arguments**: 2012 2013 - `list_objects` _Callable_ - Function returning a BucketList from an AIS cluster 2014 2015 <a id="etl.Etl"></a> 2016 2017 ## Class: Etl 2018 2019 ```python 2020 class Etl() 2021 ``` 2022 2023 A class containing ETL-related functions. 2024 2025 <a id="etl.Etl.name"></a> 2026 2027 ### name 2028 2029 ```python 2030 @property 2031 def name() -> str 2032 ``` 2033 2034 Name of the ETL 2035 2036 <a id="etl.Etl.init_spec"></a> 2037 2038 ### init\_spec 2039 2040 ```python 2041 def init_spec(template: str, 2042 communication_type: str = DEFAULT_ETL_COMM, 2043 timeout: str = DEFAULT_ETL_TIMEOUT, 2044 arg_type: str = "") -> str 2045 ``` 2046 2047 Initializes ETL based on Kubernetes pod spec template. 2048 2049 **Arguments**: 2050 2051 - `template` _str_ - Kubernetes pod spec template 2052 Existing templates can be found at `sdk.etl_templates` 2053 For more information visit: https://github.com/NVIDIA/ais-etl/tree/master/transformers 2054 - `communication_type` _str_ - Communication type of the ETL (options: hpull, hrev, hpush) 2055 - `timeout` _str_ - Timeout of the ETL job (e.g. 5m for 5 minutes) 2056 2057 **Returns**: 2058 2059 Job ID string associated with this ETL 2060 2061 <a id="etl.Etl.init_code"></a> 2062 2063 ### init\_code 2064 2065 ```python 2066 def init_code(transform: Callable, 2067 dependencies: List[str] = None, 2068 preimported_modules: List[str] = None, 2069 runtime: str = _get_default_runtime(), 2070 communication_type: str = DEFAULT_ETL_COMM, 2071 timeout: str = DEFAULT_ETL_TIMEOUT, 2072 chunk_size: int = None, 2073 arg_type: str = "") -> str 2074 ``` 2075 2076 Initializes ETL based on the provided source code. 2077 2078 **Arguments**: 2079 2080 - `transform` _Callable_ - Transform function of the ETL 2081 - `dependencies` _list[str]_ - Python dependencies to install 2082 - `preimported_modules` _list[str]_ - Modules to import before running the transform function. This can 2083 be necessary in cases where the modules used both attempt to import each other circularly 2084 - `runtime` _str_ - [optional, default= V2 implementation of the current python version if supported, else 2085 python3.8v2] Runtime environment of the ETL [choose from: python3.8v2, python3.10v2, python3.11v2] 2086 (see ext/etl/runtime/all.go) 2087 - `communication_type` _str_ - [optional, default="hpush"] Communication type of the ETL (options: hpull, hrev, 2088 hpush, io) 2089 - `timeout` _str_ - [optional, default="5m"] Timeout of the ETL job (e.g. 5m for 5 minutes) 2090 - `chunk_size` _int_ - Chunk size in bytes if transform function in streaming data. 2091 (whole object is read by default) 2092 - `arg_type` _optional, str_ - The type of argument the runtime will provide the transform function. 2093 The default value of "" will provide the raw bytes read from the object. 2094 When used with hpull communication_type, setting this to "url" will provide the URL of the object. 2095 2096 **Returns**: 2097 2098 Job ID string associated with this ETL 2099 2100 <a id="etl.Etl.view"></a> 2101 2102 ### view 2103 2104 ```python 2105 def view() -> ETLDetails 2106 ``` 2107 2108 View ETL details 2109 2110 **Returns**: 2111 2112 - `ETLDetails` - details of the ETL 2113 2114 <a id="etl.Etl.start"></a> 2115 2116 ### start 2117 2118 ```python 2119 def start() 2120 ``` 2121 2122 Resumes a stopped ETL with given ETL name. 2123 2124 Note: Deleted ETLs cannot be started. 2125 2126 <a id="etl.Etl.stop"></a> 2127 2128 ### stop 2129 2130 ```python 2131 def stop() 2132 ``` 2133 2134 Stops ETL. Stops (but does not delete) all the pods created by Kubernetes for this ETL and 2135 terminates any transforms. 2136 2137 <a id="etl.Etl.delete"></a> 2138 2139 ### delete 2140 2141 ```python 2142 def delete() 2143 ``` 2144 2145 Delete ETL. Deletes pods created by Kubernetes for this ETL and specifications for this ETL 2146 in Kubernetes. 2147 2148 Note: Running ETLs cannot be deleted. 2149