github.com/alwaysproblem/mlserving-tutorial@v0.0.0-20221124033215-121cfddbfbf4/TFserving/README.md (about) 1 # TFServing 2 3 ## Basic tutorial for Tensorflow Serving 4 5 ## **Install Docker** 6 7 - **Window/MacOS**: install Docker from [DockerHub](https://hub.docker.com/?overlay=onboarding). (*need to register new account if you are newbie*) 8 9 - **linux**: install [Docker](https://runnable.com/docker/install-docker-on-linux) 10 11 ## **Tutorial for starting** 12 13 - clone this repo 14 15 ```bash 16 $ git clone https://github.com/Alwaysproblem/MLserving-tutorial 17 $ cd MLserving-tutorial/TFserving/ClientAPI 18 ``` 19 20 - clone tensorflow from source (optional) 21 22 ```bash 23 $ git clone -b <version_you_need> https://github.com/tensorflow/tensorflow 24 ``` 25 26 27 - clone serving from source (optional) 28 29 ```bash 30 $ git clone -b <version_you_need> https://github.com/tensorflow/serving 31 ``` 32 33 ## **Easy TFServer** 34 35 - try simple example from tensorflow document. 36 37 ```bash 38 # Download the TensorFlow Serving Docker image and repo 39 $ docker pull tensorflow/serving 40 41 $ git clone https://github.com/tensorflow/serving 42 # Location of demo models 43 TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata" 44 45 # Start TensorFlow Serving container and open the REST API port 46 $ docker run -it --rm -p 8501:8501 \ 47 -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \ 48 -e MODEL_NAME=half_plus_two \ 49 tensorflow/serving & 50 51 # Query the model using the predict API 52 # need to create a new terminal. 53 $ curl -d '{"instances": [1.0, 2.0, 5.0]}' \ 54 -X POST http://localhost:8501/v1/models/half_plus_two:predict 55 56 # Returns => { "predictions": [2.5, 3.0, 4.5] } 57 ``` 58 59 - Docker common command. 60 61 ```bash 62 #kill all the alive image. 63 $ docker kill $(docker ps -q) 64 65 #stop all the alinve image 66 $ docker stop $(docker ps -q) 67 68 # remove all non-running image 69 $ docker rm $$(docker ps -aq) 70 71 # check all images 72 $ dokcker ps -a 73 74 #check the all alive image. 75 $ docker ps 76 77 #run a serving image as a daemon with a readable name. 78 $ docker run -d --name serving_base tensorflow/serving 79 80 #execute a command in the docker, you should substitute $(docker image name) for you own image name. 81 $ docker exec -it ${docker image name} sh -c "cd /tmp" 82 83 # enter docker ubuntu bash 84 $ docker exec -it ${docker image name} bash -l 85 ``` 86 87 ## **Run Server with your own saved pretrain models** 88 89 - make sure your model directory like this: 90 91 ```text 92 ---save 93 | 94 ---Model Name 95 | 96 ---1 97 | 98 ---asset 99 | 100 ---variables 101 | 102 ---model.pb 103 ``` 104 105 - substitute **user_define_model_name** for you own model name and **path_to_your_own_models** for directory path of your own model 106 107 ```bash 108 # run the server. 109 $ docker run -it --rm -p 8501:8501 -v "$(pwd)/${path_to_your_own_models}/1:/models/${user_define_model_name}" -e MODEL_NAME=${user_define_model_name} tensorflow/serving & 110 111 #run the client. 112 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/${user_define_model_name}:predict 113 ``` 114 115 - you also can use tensorflow_model_server command after entering docker bash 116 117 ```bash 118 $ docker exec -it ${docker image name} bash -l 119 120 $ tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} 121 ``` 122 123 - example 124 - Save the model after running LinearKeras.py 125 126 ```bash 127 $ docker run -it --rm -p 8501:8501 -v "$(pwd)/save/Toy:/models/Toy" -e MODEL_NAME=Toy tensorflow/serving & 128 129 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy:predict 130 131 # { 132 # "predictions": [[0.999035] 133 # ] 134 ``` 135 136 - bind your own model to the server 137 - bind bash path to the model. 138 139 ```bash 140 $ docker run -p 8501:8501 --mount type=bind,source=/path/to/my_model/,target=/models/my_model -e MODEL_NAME=my_model -it tensorflow/serving 141 ``` 142 143 - example 144 145 ```bash 146 $ docker run -p 8501:8501 --mount type=bind,source=$(pwd)/save/Toy,target=/models/Toy -e MODEL_NAME=Toy -it tensorflow/serving 147 148 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy:predict 149 150 # { 151 # "predictions": [[0.999035] 152 # ] 153 ``` 154 155 ## RESTful API 156 157 - data is like 158 159 | a | b | c | d | e | f | 160 | :---: | :---: | :---: | :---: | :---: | :---: | 161 | 390 | 25 | 1 | 1 | 1 | 2 | 162 | 345 | 34 | 45 | 2 | 34 | 3456 | 163 164 - `instances` means a row of data 165 166 ```json 167 {"instances": [ 168 { 169 "a": [390], 170 "b": [25], 171 "c": [1], 172 "d": [1], 173 "e": [1], 174 "f": [2] 175 }, 176 { 177 "a": [345], 178 "b": [34], 179 "c": [45], 180 "d": [2], 181 "e": [34], 182 "f": [3456] 183 } 184 ] 185 } 186 ``` 187 188 - `inputs` means a column of data 189 190 ```json 191 {"inputs": 192 { 193 "a": [[390], [345]], 194 "b": [[25], [34]], 195 "c": [[1], [45]], 196 "d": [[1], [2]], 197 "e": [[1], [34]], 198 "f": [[2], [3456]] 199 }, 200 } 201 ``` 202 203 - [REST API](https://www.tensorflow.org/tfx/serving/api_rest) 204 205 ## **Run multiple model in TFServer** 206 207 - set up the configuration file named Toy.config 208 209 ```protobuf 210 model_config_list: { 211 config: { 212 name: "Toy", 213 base_path: "/models/save/Toy/", 214 model_platform: "tensorflow" 215 }, 216 config: { 217 name: "Toy_double", 218 base_path: "/models/save/Toy_double/", 219 model_platform: "tensorflow" 220 } 221 } 222 ``` 223 224 - substitute **Config Path** for you own configeratin file. 225 226 ```bash 227 docker run -it --rm -p 8501:8501 -v "$(pwd):/models/" tensorflow/serving --model_config_file=/models/${Config Path} --model_config_file_poll_wait_seconds=60 228 ``` 229 230 - example 231 232 ```bash 233 $ docker run -it --rm -p 8501:8501 -v "$(pwd):/models/" tensorflow/serving --model_config_file=/models/config/Toy.config 234 235 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy_double:predict 236 # { 237 # "predictions": [[6.80301666] 238 # ] 239 # } 240 241 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy:predict 242 # { 243 # "predictions": [[0.999035] 244 # ]` 245 # } 246 ``` 247 248 - bind your own path to TFserver. The model target path is related to the configuration file. 249 250 ```bash 251 $ docker run --rm -p 8500:8500 -p 8501:8501 \ 252 --mount type=bind,source=${/path/to/my_model/},target=/models/${my_model} \ 253 --mount type=bind,source=${/path/to/my/models.config},target=/models/${models.config} -it tensorflow/serving --model_config_file=/models/{models.config} 254 ``` 255 256 - example 257 258 ```bash 259 $ docker run --rm -p 8500:8500 -p 8501:8501 --mount type=bind,source=$(pwd)/save/,target=/models/save --mount type=bind,source=$(pwd)/config/Toy.config,target=/models/Toy.config -it tensorflow/serving --model_config_file=/models/Toy.config 260 261 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy_double:predict 262 # { 263 # "predictions": [[6.80301666] 264 # ] 265 # } 266 267 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy:predict 268 # { 269 # "predictions": [[0.999035] 270 # ] 271 # } 272 ``` 273 274 ## **Version control for TFServer** 275 276 - set up single version control configuration file. 277 278 ```protobuf 279 model_config_list: { 280 config: { 281 name: "Toy", 282 base_path: "/models/save/Toy/", 283 model_platform: "tensorflow", 284 model_version_policy: { 285 specific { 286 versions: 1 287 } 288 } 289 }, 290 config: { 291 name: "Toy_double", 292 base_path: "/models/save/Toy_double/", 293 model_platform: "tensorflow" 294 } 295 } 296 ``` 297 298 - set up multiple version control configuration file. 299 300 ```protobuf 301 model_config_list: { 302 config: { 303 name: "Toy", 304 base_path: "/models/save/Toy/", 305 model_platform: "tensorflow", 306 model_version_policy: { 307 specific { 308 versions: 1, 309 versions: 2 310 } 311 } 312 }, 313 config: { 314 name: "Toy_double", 315 base_path: "/models/save/Toy_double/", 316 model_platform: "tensorflow" 317 } 318 } 319 ``` 320 321 - example 322 323 ```bash 324 $ docker run --rm -p 8500:8500 -p 8501:8501 --mount type=bind,source=$(pwd)/save/,target=/models/save --mount type=bind,source=$(pwd)/config/versionctrl.config,target=/models/versionctrl.config -it tensorflow/serving --model_config_file=/models/versionctrl.config --model_config_file_poll_wait_seconds=60 325 ``` 326 327 - for POST 328 329 ```bash 330 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy/versions/1:predict 331 # { 332 # "predictions": [[10.8054295] 333 # ] 334 # } 335 336 $ curl -d '{"instances": [[1.0, 2.0]]}' -X POST http://localhost:8501/v1/models/Toy/versions/2:predict 337 # { 338 # "predictions": [[0.999035] 339 # ] 340 # } 341 ``` 342 343 - for gRPC 344 345 ```bash 346 $ python3 ClientAPI/python/grpc_request.py -m Toy -v 1 347 # outputs { 348 # key: "output_1" 349 # value { 350 # dtype: DT_FLOAT 351 # tensor_shape { 352 # dim { 353 # size: 2 354 # } 355 # dim { 356 # size: 1 357 # } 358 # } 359 # float_val: 10.805429458618164 360 # float_val: 14.010123252868652 361 # } 362 # } 363 # model_spec { 364 # name: "Toy" 365 # version { 366 # value: 1 367 # } 368 # signature_name: "serving_default" 369 # } 370 $ python3 ClientAPI/python/grpc_request.py -m Toy -v 2 371 # outputs { 372 # key: "output_1" 373 # value { 374 # dtype: DT_FLOAT 375 # tensor_shape { 376 # dim { 377 # size: 2 378 # } 379 # dim { 380 # size: 1 381 # } 382 # } 383 # float_val: 0.9990350008010864 384 # float_val: 0.9997349381446838 385 # } 386 # } 387 # model_spec { 388 # name: "Toy" 389 # version { 390 # value: 2 391 # } 392 # signature_name: "serving_default" 393 # } 394 ``` 395 396 - set an alias label for each version. Only avaliable for gRPC. 397 398 ```protobuf 399 model_config_list: { 400 config: { 401 name: "Toy", 402 base_path: "/models/save/Toy/", 403 model_platform: "tensorflow", 404 model_version_policy: { 405 specific { 406 versions: 1, 407 versions: 2 408 } 409 }, 410 version_labels { 411 key: 'stable', 412 value: 1 413 }, 414 version_labels { 415 key: 'canary', 416 value: 2 417 } 418 }, 419 config: { 420 name: "Toy_double", 421 base_path: "/models/save/Toy_double/", 422 model_platform: "tensorflow" 423 } 424 } 425 ``` 426 427 - refer to [https://www.tensorflow.org/tfx/serving/serving_config](https://www.tensorflow.org/tfx/serving/serving_config) 428 429 Please **note that** labels can only be assigned to model versions that are loaded and available for serving. Once a model version is available, one may reload the model config on the fly, to assign a label to it (can be achieved using HandleReloadConfigRequest RPC endpoint). 430 431 Maybe you should delete the label related part first, then start the tensorflow serving, and finally add the label related part to the config file on the fly. 432 433 - set flag `--allow_version_labels_for_unavailable_models` true will be able to add version lables at the first runing. 434 435 ``` bash 436 $ docker run --rm -p 8500:8500 -p 8501:8501 --mount type=bind,source=$(pwd)/save/,target=/models/save --mount type=bind,source=$(pwd)/config/versionlabels.config,target=/models/versionctrl.config -it tensorflow/serving --model_config_file=/models/versionctrl.config --model_config_file_poll_wait_seconds=60 --allow_version_labels_for_unavailable_models 437 ``` 438 439 ```bash 440 $ python3 ClientAPI/python/grpc_request.py -m Toy -l stable 441 # outputs { 442 # key: "output_1" 443 # value { 444 # dtype: DT_FLOAT 445 # tensor_shape { 446 # dim { 447 # size: 2 448 # } 449 # dim { 450 # size: 1 451 # } 452 # } 453 # float_val: 10.805429458618164 454 # float_val: 14.010123252868652 455 # } 456 # } 457 # model_spec { 458 # name: "Toy" 459 # version { 460 # value: 1 461 # } 462 # signature_name: "serving_default" 463 # } 464 $ python3 ClientAPI/python/grpc_request.py -m Toy -l canary 465 # outputs { 466 # key: "output_1" 467 # value { 468 # dtype: DT_FLOAT 469 # tensor_shape { 470 # dim { 471 # size: 2 472 # } 473 # dim { 474 # size: 1 475 # } 476 # } 477 # float_val: 0.9990350008010864 478 # float_val: 0.9997349381446838 479 # } 480 # } 481 # model_spec { 482 # name: "Toy" 483 # version { 484 # value: 2 485 # } 486 # signature_name: "serving_default" 487 # } 488 ``` 489 490 ## **Other Configuration parameter** 491 492 - [Configuration](https://github.com/tensorflow/serving/tree/master/tensorflow_serving/config) 493 494 - Batch Configuration: need to set `--enable_batching=true` and pass the config to `--batching_parameters_file`, [more](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md#batch-scheduling-parameters-and-tuning) 495 496 - CPU-only: One Approach 497 498 If your system is CPU-only (no GPU), then consider starting with the following values: `num_batch_threads` equal to the number of CPU cores; `max_batch_size` to infinity; `batch_timeout_micros` to 0. Then experiment with `batch_timeout_micros` values in the 1-10 millisecond (1000-10000 microsecond) range, while keeping in mind that 0 may be the optimal value. 499 500 - GPU: One Approach 501 502 If your model uses a GPU device for part or all of your its inference work, consider the following approach: 503 504 Set `num_batch_threads` to the number of CPU cores. 505 506 Temporarily set `batch_timeout_micros` to infinity while you tune `max_batch_size` to achieve the desired balance between throughput and average latency. Consider values in the hundreds or thousands. 507 508 For online serving, tune `batch_timeout_micros` to rein in tail latency. The idea is that batches normally get filled to `max_batch_size`, but occasionally when there is a lapse in incoming requests, to avoid introducing a latency spike it makes sense to process whatever's in the queue even if it represents an underfull batch. The best value for `batch_timeout_micros` is typically a few milliseconds, and depends on your context and goals. Zero is a value to consider; it works well for some workloads. (For bulk processing jobs, choose a large value, perhaps a few seconds, to ensure good throughput but not wait too long for the final (and likely underfull) batch.) 509 510 `batch.config` 511 512 ```protobuf 513 max_batch_size { value: 1 } 514 batch_timeout_micros { value: 0 } 515 max_enqueued_batches { value: 1000000 } 516 num_batch_threads { value: 8 } 517 ``` 518 519 - example 520 - server 521 522 ```bash 523 docker run --rm -p 8500:8500 -p 8501:8501 --mount type=bind,source=$(pwd),target=/models --mount type=bind,source=$(pwd)/config/versionctrl.config,target=/models/versionctrl.config -it tensorflow/serving --model_config_file=/models/versionctrl.config --model_config_file_poll_wait_seconds=60 --enable_batching=true --batching_parameters_file=/models/batch/batchpara.config 524 ``` 525 526 - client 527 - return error `"Task size 2 is larger than maximum batch size 1"` 528 529 ```bash 530 $ python3 ClientAPI/python/grpc_request.py -m Toy -v 1 531 # Traceback (most recent call last): 532 # File "grpcRequest.py", line 58, in <module> 533 # resp = stub.Predict(request, timeout_req) 534 # File "/Users/yongxiyang/opt/anaconda3/envs/tf2cpu/lib/python3.7/site-packages/grpc/_channel.py", line 824, in __call__ 535 # return _end_unary_response_blocking(state, call, False, None) 536 # File "/Users/yongxiyang/opt/anaconda3/envs/tf2cpu/lib/python3.7/site-packages/grpc/_channel.py", line 726, in _end_unary_response_blocking 537 # raise _InactiveRpcError(state) 538 # grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: 539 # status = StatusCode.INVALID_ARGUMENT 540 # details = "Task size 2 is larger than maximum batch size 1" 541 # debug_error_string = "{"created":"@1591246233.042335000","description":"Error received from peer ipv4:0.0.0.0:8500","file":"src# /core/lib/surface/call.cc","file_line":1056,"grpc_message":"Task size 2 is larger than maximum batch size 1","grpc_status":3}" 542 ``` 543 544 - monitor: pass file path to `--monitoring_config_file` 545 546 `monitor.config` 547 548 ```protobuf 549 prometheus_config { 550 enable: true, 551 path: "/models/metrics" 552 } 553 ``` 554 555 - request through RESTful API 556 - example 557 - server 558 559 ```bash 560 $ docker run --rm -p 8500:8500 -p 8501:8501 -v "$(pwd):/models" -it tensorflow/serving --model_config_file=/models/config/versionlabels.config --model_config_file_poll_wait_seconds=60 --allow_version_labels_for_unavailable_models --monitoring_config_file=/models/monitor/monitor.config 561 ``` 562 563 - client 564 565 ```bash 566 $ curl -X GET http://localhost:8501/monitoring/prometheus/metrics 567 # # TYPE :tensorflow:api:op:using_fake_quantization gauge 568 # # TYPE :tensorflow:cc:saved_model:load_attempt_count counter 569 # :tensorflow:cc:saved_model:load_attempt_count{model_path="/models/save/Toy/1",status="success"} 1 570 # :tensorflow:cc:saved_model:load_attempt_count{model_path="/models/save/Toy/2",status="success"} 1 571 # ... 572 # # TYPE :tensorflow:cc:saved_model:load_latency counter 573 # :tensorflow:cc:saved_model:load_latency{model_path="/models/save/Toy/1"} 54436 574 # :tensorflow:cc:saved_model:load_latency{model_path="/models/save/Toy/2"} 45230 575 # ... 576 # # TYPE :tensorflow:mlir:import_failure_count counter 577 # # TYPE :tensorflow:serving:model_warmup_latency histogram 578 # # TYPE :tensorflow:serving:request_example_count_total counter 579 # # TYPE :tensorflow:serving:request_example_counts histogram 580 # # TYPE :tensorflow:serving:request_log_count counter 581 ``` 582 583 - show monitor data in the prometheus docker 584 - modified your own prometheus configuration file 585 586 ```yaml 587 # my global config 588 global: 589 scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 590 evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 591 # scrape_timeout is set to the global default (10s). 592 593 # Alertmanager configuration 594 alerting: 595 alertmanagers: 596 - static_configs: 597 - targets: 598 # - alertmanager:9093 599 600 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. 601 rule_files: 602 # - "first_rules.yml" 603 # - "second_rules.yml" 604 605 # A scrape configuration containing exactly one endpoint to scrape: 606 # Here it's Prometheus itself. 607 scrape_configs: 608 - job_name: 'tensorflow' 609 scrape_interval: 5s 610 metrics_path: '/monitoring/prometheus/metrics' 611 static_configs: 612 - targets: ['docker.for.mac.localhost:8501'] # for `Mac users` 613 # - targets: ['127.0.0.1:8501'] 614 ``` 615 616 - start prometheus docker server 617 618 ```bash 619 $ docker run --rm -ti --name prometheus -p 127.0.0.1:9090:9090 -v "$(pwd)/monitor:/tmp" prom/prometheus --config.file=/tmp/prometheus/prome.yaml 620 ``` 621 622 - access prometheus on the webUI 623 - check target and status 624 ![target](images/target.png) 625 ![status](images/status.png) 626 - webUI on [localhost:9090](http://localhost:9090/) 627 ![graph](images/prom_graph.png) 628 629 <!-- - request with gRPC TODO: --> 630 631 ## **Obtain the information** 632 633 - get the information data structure. 634 635 ```bash 636 curl -d '{"instances": [[1.0, 2.0]]}' -X GET http://localhost:8501/v1/models/Toy/metadata 637 ``` 638 639 - get the information data structure with gRPC 640 641 ```bash 642 $ python ClientAPI/python/grpc_metadata.py -m Toy -v 2 643 # model_spec { 644 # name: "Toy" 645 # version { 646 # value: 2 647 # } 648 # } 649 # metadata { 650 # key: "signature_def" 651 # value { 652 # type_url: "type.googleapis.com/tensorflow.serving.SignatureDefMap" 653 # value: "\n\253\001\n\017serving_default\022\227\001\n;\n\007input_1\0220\n\031serving_default_input_1:0\020\001\032\021\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\002\022<\n\010output_1\0220\n\031StatefulPartitionedCall:0\020\001\032\021\022\013\010\377\377\377\377\377\377\377\377\377\001\022\002\010\001\032\032tensorflow/serving/predict\n>\n\025__saved_model_init_op\022%\022#\n\025__saved_model_init_op\022\n\n\004NoOp\032\002\030\001" 654 # } 655 # } 656 ``` 657 658 ## **Accerleration by GPU** 659 660 - pull tensorflow server GPU version from DockerHub. 661 662 ```bash 663 docker pull tensorflow/serving:latest-gpu 664 ``` 665 666 - clone the server.git if you haven't done it. 667 668 ```bash 669 git clone https://github.com/tensorflow/serving 670 ``` 671 672 - set `--runtime==nvidia` and use the `tensorflow/serving:latest-gpu` 673 674 ```bash 675 docker run --runtime=nvidia -p 8501:8501 -v "$(pwd)/${path_to_your_own_models}/1:/models/${user_define_model_name}" -e MODEL_NAME=${user_define_model_name} tensorflow/serving & 676 ``` 677 678 - example 679 680 ```bash 681 docker run --runtime=nvidia -p 8501:8501 -v "$(pwd)/save/Toy:/models/Toy" -e MODEL_NAME=Toy tensorflow/serving:latest-gpu & 682 or 683 nvidia-docker run -p 8501:8501 -v "$(pwd)/save/Toy:/models/Toy" -e MODEL_NAME=Toy tensorflow/serving:latest-gpu & 684 or 685 docker run --gpu ${all/1} -p 8501:8501 -v "$(pwd)/save/Toy:/models/Toy" -e MODEL_NAME=Toy tensorflow/serving:latest-gpu & 686 ``` 687 688 ## Setup client API 689 690 691 - [GO](ClientAPI/go/README.md) 692 - [Python](ClientAPI/python/README.md) 693 - [Cpp-cmake](ClientAPI/cpp/cmake/README.md) 694 - [Cpp-cmake-static-lib](ClientAPI/cpp/cmake-static-lib/README.md) 695 - [Cpp-make](ClientAPI/cpp/make/README.md) 696 - [Cpp-make-static-lib](ClientAPI/cpp/make-static-lib/README.md) 697 698 ## Feature Column and vocabulary file for serving 699 <!-- TODO: --> 700 701 ## For production 702 703 - [SavedModel Warmup](https://www.tensorflow.org/tfx/serving/saved_model_warmup) 704 - please see ClientAPI/wramup/warmup.py 705 - `--enable_model_warmup`: Enables model warmup using user-provided PredictionLogs in assets.extra/ directory 706 707 ```bash 708 $ python ClientAPI/wramup/warmup.py # it will generate tf_serving_warmup_requests tfrecords (2.9.2) 709 $ cp tf_serving_warmup_requests <model_dir>/<version>/assets.extra/tf_serving_warmup_requests 710 ``` 711 712 The server log: 713 ```log 714 2022-11-04 06:34:39.419417: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /models/save/Toy/2 715 2022-11-04 06:34:39.426058: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 32252 microseconds. 716 2022-11-04 06:34:39.426708: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:73] Starting to read warmup data for model at /models/save/Toy/2/assets.extra/tf_serving_warmup_requests with model-warmup-options 717 2022-11-04 06:34:39.441661: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:210] Finished reading warmup data for model at /models/save/Toy/2/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1. Elapsed time (microseconds): 15304. 718 ``` 719 720 ## Advanced Tutorial 721 722 ### Custom Operation 723 724 For custom operation of tensorflow, please refer to the [Create an Op](https://www.tensorflow.org/guide/create_op) and [Serving TensorFlow models with custom ops](https://www.tensorflow.org/tfx/serving/custom_op). However, we only provide a example for custom op and the way to modify the tensorflow server. you can click this link for detail: 725 726 [Custom operation examples](CustomOp/README.md)