PyData Miami 2022

Leveraging open source inference servers for standardizing model deployments on CPUs and GPUs
09-22, 17:15–17:45 (US/Eastern), Main Room

In your current production environment/data science practice do you have applications that have low latency constraints? Do you have multiple team members working in different frameworks and deploying across multiple different inference servers? Are you still using Flask to deploy your models? If your answer is yes then you can leverage the capabilities of open source inference servers to standardize your model deployment on both CPUs and GPUs across different frameworks. In order to deploy more complex ensemble models but still maintain low latency, post-training model optimization is a key factor in reducing latency times. We will also look at how to do model ensembling across multiple framework backends (PyTorch, TensorFlow, Python) along with running multiple model copies using a single inference server instance.


This talk will cover
- Best practices for CPU and GPU model deployments
- How to run multiple models on a single GPU
- Leveraging more powerful GPUs for very high throughput model inference applications
- How to do post-training model optimization to reduce model inference times and increase overall throughput


Prior Knowledge Expected

Previous knowledge expected

Dr. Mark Moyou is a Senior Data Scientist at NVIDIA on the Retail team focused on enabling scalable machine learning for the nation's top Retailers. Before NVIDIA, he was a Data Science Manager in the Professional Services division at Lucidworks, an Enterprise Search and Recommendations company. Prior to Lucidworks, he was a Data Scientist at Alstom Transportation where he applied Data Science to the Railroad Industry. Mark holds a PhD and MSc in Systems Engineering and a BSc in Chemical Engineering. His machine learning research in grad school focused on 2D and 3D Geometric shape matching and retrieval, object detection in video streams, anomaly detection in IP stream data and concrete crack detection of bridge structures. On the side, Mark runs the Caribbean Data Science Podcast and the Southern Data Science Conference.

This speaker also appears in: