github.com/apache/beam/sdks/v2@v2.48.2/python/apache_beam/testing/benchmarks/inference/README.md (about) 1 <!-- 2 Licensed to the Apache Software Foundation (ASF) under one 3 or more contributor license agreements. See the NOTICE file 4 distributed with this work for additional information 5 regarding copyright ownership. The ASF licenses this file 6 to you under the Apache License, Version 2.0 (the 7 "License"); you may not use this file except in compliance 8 with the License. You may obtain a copy of the License at 9 10 http://www.apache.org/licenses/LICENSE-2.0 11 12 Unless required by applicable law or agreed to in writing, 13 software distributed under the License is distributed on an 14 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 KIND, either express or implied. See the License for the 16 specific language governing permissions and limitations 17 under the License. 18 --> 19 20 # RunInference Benchmarks 21 22 This module contains benchmarks used to test the performance of the RunInference transform 23 running inference with common models and frameworks. Each benchmark is explained in detail 24 below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1 25 26 ## Pytorch RunInference Image Classification 50K 27 28 The Pytorch RunInference Image Classification 50K benchmark runs an 29 [example image classification pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification.py) 30 using various different resnet image classification models (the benchmarks on 31 [Beam's dashboard](http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1) 32 display [resnet101](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet101.html) and [resnet152](https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet152.html)) 33 against 50,000 example images from the OpenImage dataset. The benchmarks produce 34 the following metrics: 35 36 - Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction 37 - Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images 38 - Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker 39 startup, so the cost is amortized across the pipeline. 40 41 These metrics are published to InfluxDB and BigQuery. 42 43 ### Pytorch Image Classification Tests 44 45 * Pytorch Image Classification with Resnet 101. 46 * machine_type: n1-standard-2 47 * num_workers: 75 48 * autoscaling_algorithm: NONE 49 * disk_size_gb: 50 50 51 * Pytorch Image Classification with Resnet 152. 52 * machine_type: n1-standard-2 53 * num_workers: 75 54 * autoscaling_algorithm: NONE 55 * disk_size_gb: 50 56 57 * Pytorch Imagenet Classification with Resnet 152 with Tesla T4 GPU. 58 * machine_type: 59 * CPU: n1-standard-2 60 * GPU: NVIDIA Tesla T4 61 * num_workers: 75 62 * autoscaling_algorithm: NONE 63 * disk_size_gb: 50 64 65 Approximate size of the models used in the tests 66 * resnet101: 170.5 MB 67 * resnet152: 230.4 MB 68 69 ## Pytorch RunInference Language Modeling 70 71 The Pytorch RunInference Language Modeling benchmark runs an 72 [example language modeling pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py) 73 using the [Bert large uncased](https://huggingface.co/bert-large-uncased) 74 and [Bert base uncased](https://huggingface.co/bert-base-uncased) models 75 and a dataset of 50,000 manually generated sentences. The benchmarks produce 76 the following metrics: 77 78 - Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction 79 - Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images 80 - Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker 81 startup, so the cost is amortized across the pipeline. 82 83 These metrics are published to InfluxDB and BigQuery. 84 85 ### Pytorch Language Modeling Tests 86 87 * Pytorch Langauge Modeling using Hugging Face bert-base-uncased model. 88 * machine_type: n1-standard-2 89 * num_workers: 250 90 * autoscaling_algorithm: NONE 91 * disk_size_gb: 50 92 93 * Pytorch Langauge Modeling using Hugging Face bert-large-uncased model. 94 * machine_type: n1-standard-2 95 * num_workers: 250 96 * autoscaling_algorithm: NONE 97 * disk_size_gb: 50 98 99 Approximate size of the models used in the tests 100 * bert-base-uncased: 417.7 MB 101 * bert-large-uncased: 1.2 GB 102 103 All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).