Luma
Free
In 5 days

LLM-as-a-Judge: Meta Evaluation

by Arize AI

About this event

Back by popular demand! This session builds on LLM-as-a-Judge fundamentals and focuses on the essential practice of meta-evaluation: evaluating your evaluator to ensure your metrics are meaningful and trustworthy.You’ll learn how to validate whether your LLM judge is measuring the right thing, how closely it aligns with human judgment, and how to identify where it fails. We’ll walk through comparing LLM and human annotations on a curated golden dataset, calculating precision/recall/F1, and inspecting disagreement cases to understand why your evaluator struggles.We’ll also cover advanced techniques such as treating humans and LLMs as annotators to estimate inter-annotator agreement, and using high-temperature stress tests to detect prompt ambiguity or unstable reasoning.Finally, you’ll learn how to use these insights to iteratively refine your evaluation—adjusting prompts, criteria, or data coverage one change at a time—until you’re confident your eval reflects human expectations. ​​This session will leave you with practical tools for building evals you can trust.

Topics & Tags

AI
AI
Date & time
Wednesday, March 18, 2026 · 5:00 PM – 6:00 PM
America/Los_Angeles
Location
TBA
America/Los_Angeles
Arize AI
Organised by
Arize AI
Type
independent
SourceLuma
UpdatedMar 11, 2026

You might also like