Name: Measuring What Works: Agent Evals, Context Quality, and Optimization
Start: 2026-03-19T16:00:00+00:00
End: 2026-03-19T16:30:00+00:00

If you can’t measure it, you can’t improve it, especially with agents. Most teams rely on vibes, anecdotes, or raw model benchmarks to judge agent performance. That breaks down fast in real developer workflows. This session goes deep on evaluation and optimization. We’ll show how to define meaningful grading criteria and measure what actually improves agent outcomes in production. You’ll learn how to evaluate agent performance, quantify the impact of different context packages, and turn failures into a continuous improvement loop. Expect a practical view of what “agent performance” really means. What you’ll learn How to design realistic, repeatable agent evaluation tasksGrading criteria that reflect real developer success Ways to measure the impact of docs, rules, and examples on outcomesTurning production failures into feedback that improves context over time Speakers Dru Knox Head of Product, Tessl Dru leads Product at Tessl. He brings deep experience in platform and ecosystem development, having helped build two of the largest developer platforms in the world, Android and the web. His work sits at the intersection of product design, developer experience, and systems thinking. Outside of work, he’s drawn to design, game theory, and a bit of armchair philosophy. Maria Gorinova Member of Technical Staff, Tessl Maria is an AI Research Engineer at Tessl. Her experience spans machine learning and computer science, including probabilistic programming, variational inference, graph neural networks, geometric deep learning, programming language design, and program analysis, with applications across science, healthcare, and social media. Who this is for Engineers, researchers, platform teams, and technical leaders who want evidence-based answers to what actually makes agents better.

Measuring What Works: Agent Evals, Context Quality, and Optimization

About this event

Topics & Tags

You might also like