Mr. Abhinav Balasubramanian
Measuring What Matters: Evaluation Metrics and Methodologies for Generative AI
Abstract of talk:
As generative AI systems continue to advance across modalities like text, image and code, evaluating their outputs has become increasingly complex and context dependent. This talk offers a practical overview of the key metrics and methodologies used to assess generative models. We’ll cover established metrics such as BLEU and ROUGE for text, FID and Inception Score for images and emerging approaches like human preference scoring, toxicity filters and task based evaluations.
The session will also explore the limitations of purely automated metrics and discuss when human evaluation becomes essential. All methods presented are drawn from publicly available research and tools. This talk is aimed at practitioners and researchers who want to better understand how to quantify quality, diversity and alignment in generative outputs.
.png)