Luma
In 6 days

Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs

by Trajectory Labs

About this event

Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks. ​Event Schedule6:00 to 6:30 - Food and introductions6:30 to 7:30 - Presentation and Q&A7:30 to 9:00 - Open Discussions ​​​​If you can't attend in person, join our live stream starting at 6:30 pm via this link. ​​​​This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:  ​​​​How do we ensure AI systems are aligned with human interests? ​​​​How do we measure and mitigate potential risks from advanced AI systems? ​​​​What does safer AI development look like?

Topics & Tags

AI
AI
Date & time
Apr 2 – Apr 3, 2026
America/Toronto
Location
30 Adelaide St E, Toronto, ON M5C 3G8, Canada, Toronto, Canada
America/Toronto
Price
cad 5
Trajectory Labs
Organised by
Trajectory Labs
Type
independent
SourceLuma
UpdatedMar 28, 2026

You might also like