Hrittik Roy
Hrittik is currently Platform Advocate at Loft Labs and a CNCF Ambassador, who has previously worked at various startups helping them scale their content efforts. He loves diving deep into distributed systems and creating articles on them and has spoken at conferences such as Azure Cloud Summit, UbuCon Asia and Kubernetes Community Days - Lagos and Chennai among others! His best days are when he finds ways to create impact in the communities he’s a part of either by code, content, or mentorship!
Intervention
As the LLMs and generative models become more and more complex, one can't simply train them on CPU, or a single GPU cluster, this requires the use of multiple GPUs but managing those can be complicated.GPU partitioning in the cloud is perceived to be a complicated, resource-consuming process that is worth the exclusive involvement of narrowly focused teams or large enterprises. So this talk explores why GPU partitioning is necessary for running Python AI workloads and how it can be done efficiently using open source tooling.
The talk will cover about some common myths: that this has something to do with advanced hardware configurations or prohibitive costs, on systems likeKubernetes
In this talk, we will illustrate how modern frameworks like NVIDIA MIG with vCluster effectively enable seamless sharing of GPUs across different teams, leading to more efficient resource utilization, higher throughput, and broader accessibility for workloads like LLM finetuning and inference. The talk aims to inspire developers, engineers to understand the key techniques for efficient GPU scheduling and sharing of resources across multiple GPU Clusters with open source platform tooling like vCluster.