October 12, 2021
Los Angeles, California + Virtual
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2021 - Los Angeles, CA + Virtual and add this Co-Located event to your registration to participate in these sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Pacific Standard Time (PST), UTC -7. To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change through Monday, September 13 due to schedule changes that will be made as speakers finalize whether speaking in person or virtually.
Back To Schedule
Tuesday, October 12 • 2:05pm - 2:35pm
Serving Machine Learning Models at Scale Using KServe - Yuzhui Liu, Bloomberg

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
KServe (previously known as KFServing) is a serverless open source solution to serve machine learning models. With machine learning becoming more widely adopted in organizations, the trend is to deploy larger numbers of models. Plus, there is an increasing need to serve models using GPUs. As GPUs are expensive, engineers are seeking ways to serve multiple models with one GPU. The KServe community designed a Multi-Model Serving solution to scale the number of models that can be served in a Kubernetes cluster. By sharing the serving container that is enabled to host multiple models, Multi-Model Serving addresses three limitations that the current ‘one model, one service’ paradigm encounters: 1) Compute resources (including the cost for public cloud), 2) Maximum number of pods, 3) Maximum number of IP addresses. 4) Maximum number of services This talk will present the design of Multi-Model Serving, describe how to use it to serve models for different frameworks, and share benchmark stats that demonstrate its scalability.

avatar for Yuzhui Liu

Yuzhui Liu

Team Lead, Bloomberg
Yuzhui Liu leads the Data Science Runtime team at Bloomberg. Her team manages an on-prem Kubernetes-based machine learning infrastructure that is used to address Bloomberg’s evolving data science needs. She is actively involved in the Kubernetes open source ecosystem as both a contributor... Read More →

Tuesday October 12, 2021 2:05pm - 2:35pm PDT
Room 502 AB + Online