
About this event
Deep technical sessions. Live demos. Real conversations.
If you're deploying, or scaling LLM inference, this is the room to be in.
Join Red Hat AI, IBM, NVIDIA, The Open Accelerator, MIT, and the vLLM community in Boston for an evening of technical depth:
Hear directly from vLLM maintainers and committersSee live demos of real inference workflowsLearn how to put your learnings into practice with the Open AcceleratorConnect with the engineers and platform teams pushing the state of the art
Program
Optional Pre-Event Workshop
3:30 PM — Doors Open for Workshop Attendees4:00–5:00 PM — Distributed Inference with llm-d: Your Production-Ready Path to Scalable LLM Inference
About the workshop: llm-d is a distributed inference orchestration layer that reduces tail latency (P95/P99) through intelligent cache-aware routing. In this hands-on workshop, participants deploy Llama 3.1 8B with vLLM, benchmark single-GPU performance, scale to multiple GPUs with naive load balancing, and then use llm-d to demonstrate how cache-aware routing significantly reduces tail latency.
Meetup Agenda
5:00–5:30 PM — Doors Open, Check-In5:30–5:40 PM — Welcome and Opening Remarks
Saša Zelenović, Sr. Technical Marketing Manager, Red Hat AI
5:40–6:00 PM — Intro to vLLM and Project Update
Michael Goin, vLLM Maintainer and Principal Engineer, Red Hat AI
6:00–6:20 PM — Getting Started with Model Compression for Fast and Efficient Inference
Charles Hernandez, ML Engineer, Red Hat AI
6:20–6:40 PM — Accelerating LLM Inference with Speculative Decoding
Helen Zhao, ML Engineer, Red Hat AIFynn Schmitt-Ulms, ML Engineer, Red Hat AI
6:40–7:00 PM — Dynamic Agent & Inference Optimization with NeMo Agent Toolkit
Dhruv Nandakumar, Agent and Inference Engineering, NVIDIA
7:00–7:30 PM — Tackling Distributed Inference at Scale with llm-d and Kubernetes
Carlos Costa, Distinguished Engineer, IBM
7:30–7:40 PM — From Meetup to Hackathon: Building Together in the Open AI Accelerator
Stefanie Chiras, SVP, The Open Accelerator, Red Hat
7:40–8:00 PM — Community Discussion and Q&A
8:00–9:00 PM — Networking, Food and Drinks
Who Should Come
vLLM users and contributorsML and infra engineers working on inference and servingPlatform teams running GenAI in productionAnyone curious about efficient inference across local, cloud, and Kubernetes
Before You Arrive
Registration closes 24 hours before the eventUnregistered attendees cannot be admittedBring a photo ID for check-in
See you in Boston! The inference conversation starts here.
Topics & Tags
AI
AI
Date & time
Mar 31 – Apr 1, 2026
America/New_York
Location
314 Main St, Cambridge, MA 02142, USA, Cambridge, United States
America/New_York
Attendance
89 going · 89 spots
111 spots remaining
Organised by
vLLM Meetups and Events