github.com/apache/beam/sdks/v2@v2.48.2/python/apache_beam/testing/benchmarks/inference/README.md (about)

     1  <!--
     2      Licensed to the Apache Software Foundation (ASF) under one
     3      or more contributor license agreements.  See the NOTICE file
     4      distributed with this work for additional information
     5      regarding copyright ownership.  The ASF licenses this file
     6      to you under the Apache License, Version 2.0 (the
     7      "License"); you may not use this file except in compliance
     8      with the License.  You may obtain a copy of the License at
     9  
    10        http://www.apache.org/licenses/LICENSE-2.0
    11  
    12      Unless required by applicable law or agreed to in writing,
    13      software distributed under the License is distributed on an
    14      "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    15      KIND, either express or implied.  See the License for the
    16      specific language governing permissions and limitations
    17      under the License.
    18  -->
    19  
    20  # RunInference Benchmarks
    21  
    22  This module contains benchmarks used to test the performance of the RunInference transform
    23  running inference with common models and frameworks. Each benchmark is explained in detail
    24  below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1
    25  
    26  ## Pytorch RunInference Image Classification 50K
    27  
    28  The Pytorch RunInference Image Classification 50K benchmark runs an
    29  [example image classification pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_image_classification.py)
    30  using various different resnet image classification models (the benchmarks on
    31  [Beam's dashboard](http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1)
    32  display [resnet101](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet101.html) and [resnet152](https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet152.html))
    33  against 50,000 example images from the OpenImage dataset. The benchmarks produce
    34  the following metrics:
    35  
    36  - Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction
    37  - Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images
    38  - Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker
    39  startup, so the cost is amortized across the pipeline.
    40  
    41  These metrics are published to InfluxDB and BigQuery.
    42  
    43  ### Pytorch Image Classification Tests
    44  
    45  * Pytorch Image Classification with Resnet 101.
    46    * machine_type: n1-standard-2
    47    * num_workers: 75
    48    * autoscaling_algorithm: NONE
    49    * disk_size_gb: 50
    50  
    51  * Pytorch Image Classification with Resnet 152.
    52    * machine_type: n1-standard-2
    53    * num_workers: 75
    54    * autoscaling_algorithm: NONE
    55    * disk_size_gb: 50
    56  
    57  * Pytorch Imagenet Classification with Resnet 152 with Tesla T4 GPU.
    58    * machine_type:
    59      * CPU: n1-standard-2
    60      * GPU: NVIDIA Tesla T4
    61    * num_workers: 75
    62    * autoscaling_algorithm: NONE
    63    * disk_size_gb: 50
    64  
    65  Approximate size of the models used in the tests
    66  * resnet101: 170.5 MB
    67  * resnet152: 230.4 MB
    68  
    69  ## Pytorch RunInference Language Modeling
    70  
    71  The Pytorch RunInference Language Modeling benchmark runs an
    72  [example language modeling pipeline](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py)
    73  using the [Bert large uncased](https://huggingface.co/bert-large-uncased)
    74  and [Bert base uncased](https://huggingface.co/bert-base-uncased) models
    75  and a dataset of 50,000 manually generated sentences. The benchmarks produce
    76  the following metrics:
    77  
    78  - Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction
    79  - Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images
    80  - Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker
    81  startup, so the cost is amortized across the pipeline.
    82  
    83  These metrics are published to InfluxDB and BigQuery.
    84  
    85  ### Pytorch Language Modeling Tests
    86  
    87  * Pytorch Langauge Modeling using Hugging Face bert-base-uncased model.
    88    * machine_type: n1-standard-2
    89    * num_workers: 250
    90    * autoscaling_algorithm: NONE
    91    * disk_size_gb: 50
    92  
    93  * Pytorch Langauge Modeling using Hugging Face bert-large-uncased model.
    94    * machine_type: n1-standard-2
    95    * num_workers: 250
    96    * autoscaling_algorithm: NONE
    97    * disk_size_gb: 50
    98  
    99  Approximate size of the models used in the tests
   100  * bert-base-uncased: 417.7 MB
   101  * bert-large-uncased: 1.2 GB
   102  
   103  All the performance tests are defined at [job_InferenceBenchmarkTests_Python.groovy](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy).