Mr. Ramesh Mohana Murugan

GPU Training Efficiency

Abstract: 

Efficient GPU utilization is critical in machine learning (ML) training, where large-scale models and datasets demand significant computational resources. GPUs are designed to accelerate the training of complex models, but inefficiencies in their usage can lead to substantial overheads and waste, negatively impacting cost, time, and environmental sustainability. These inefficiencies result in underutilized GPUs, longer training times, and higher expenses. This talk aims to dive deep into the different overheads and wastages associated with various workflows involved in ML Training.