
About this event
Jackson Kaunismaa presents his new paper “Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs”. He will discuss why output-level safeguards on frontier models don’t actually make the ecosystem safe, and how anyone with an open-source model can fine-tune it on adjacent-domain outputs from safeguarded models to recover a large fraction of the capability gap between open-source and frontier models on harmful tasks.
Event Schedule6:00 to 6:30 - Food and introductions6:30 to 7:30 - Presentation and Q&A7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
How do we ensure AI systems are aligned with human interests? How do we measure and mitigate potential risks from advanced AI systems? What does safer AI development look like?
Topics & Tags
AI
AI
Date & time
Apr 2 – Apr 3, 2026
America/Toronto
Location
30 Adelaide St E, Toronto, ON M5C 3G8, Canada, Toronto, Canada
America/Toronto
Price
cad 5
Organised by
Trajectory Labs