

Join us for an exciting talk by Mary Gibbs, Senior Applied Scientist at Relativity.
Agenda:
6:00 - 6:30 PM - Welcome and mingle6:30- 6:45 PM - Introductions6:45 - 7:30 PM - Talk7:30 - 8:00 PM - Wrap up
Description:
If you have ever shipped a model, watched your metrics improve, and later learned from your users that something was wrong, the metrics were always wrong. You just didn’t know it yet. An evaluation consists of three components, a benchmark, a scorer, and a claim about what a score represents. Each component has its own weaknesses. Benchmarks can suffer from narrow coverage, contamination, or saturation. Scorers are often chosen for ease of automation or computation rather than for their alignment with user outcomes. And the claim connecting a score to reality is rarely made explicit. These gaps compound across the model development lifecycle. When metrics improve, teams treat that as a signal and optimize directly against it, which is how a measurement problem becomes a model problem. This talk maps where evaluations can go wrong, considers counterarguments, and ends with practical advice for building better ones.
Speaker Bio:
Mary is a Senior Applied Scientist at Relativity, tackling data science challenges in the e-discovery and legal tech space. She is also an organizer for Women and Gender eXpansive Coders DC (formerly Women Who Code DC), fostering a community dedicated to empowering women and nonbinary individuals to excel in their careers. Mary's experience spans various domains. She has developed data science solutions related to job search and career progression at Teal, cybersecurity challenges at LiveAction Software, and commercial and government consulting at Mosaic Data Science. Before venturing into the field of data science, Mary conducted and published research pertaining to the cellular and molecular mechanisms underlying neurodevelopment at the National Institutes of Health. In other words, she has dissected and imaged a lot of fruit fly brains. She holds a M.S. in Data Science from The George Washington University and a B.A. in Biological Sciences from Cornell University.
Sign in to view full event details
Create a free account to see descriptions, save events, and more
Your Evals Are Bad And You Should Feel Bad: Evaluation and the Model Development Lifecycle is a free independent taking place on Tuesday, May 26, 2026 at 2112 Pennsylvania Ave NW, Washington, DC 20037, USA, Washington, United States. Attendance is free — register to secure your spot. Currently 23 people have registered out of 23 spots. The event runs for approximately 2 hours.
Join this independent over 2 hours for an engaging session of learning, discussion, and networking with fellow attendees.
This independent in Washington is ideal for:
This evening independent is part of the growing events scene in Washington. Whether you're based in Washington or visiting for the independent, it's a great opportunity to connect with the local community. Browse more upcoming events in Washington on Rifio.
Your Evals Are Bad And You Should Feel Bad: Evaluation and the Model Development Lifecycle covers topics including AI, DC Data & AI Events. Find similar events by browsing these topics on Rifio.