Best Paper Award at AIES 2025
"When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails" received the Best Paper Award at AIES 2025.
Announcement →Updates, announcements, and milestones from my research journey.
Best Paper Award at AIES 2025
"When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails" received the Best Paper Award at AIES 2025.
Announcement →Granite Guardian 3.3 released
Granite Guardian 3.3 is here, featuring hybrid thinking mode and a 3rd-place finish on the LLM-AggreFact benchmark.
Model →Presenting CAST at ICLR 2025
Presenting CAST on conditional activation steering at ICLR in Singapore.
Poster →Granite Guardian tops GuardBench leaderboard
Granite Guardian tops GuardBench, a leaderboard for guardrail models.
Blog →Granite Guardian accepted at NAACL 2025 (Oral)
Granite Guardian has been accepted to the NAACL '25 Industry Track as an Oral presentation. The latest Granite Guardian 3.1 model now also covers agentic risks.
arXiv →CAST accepted as spotlight at ICLR 2025
CAST, our work on conditional activation steering, is accepted as a spotlight at ICLR '25.
Paper →Presenting at NeurIPS 2024 Pluralistic Alignment workshop
Attending NeurIPS '24 to present our work on value alignment at the Pluralistic Alignment workshop.
Workshop →Granite Guardian preprint on arXiv
Our paper 'Granite Guardian' is now available as a preprint on arXiv.
arXiv →Granite Guardian 3.0 released
Granite Guardian 3.0 is out! It helps detect input and response risks, including various harm and RAG hallucinations.
GitHub →CAST: Conditional activation steering
CAST, work on conditional activation steering by summer intern Bruce Lee, is now on arXiv.
arXiv →Alignment Studio accepted to IEEE Internet Computing
Alignment Studio is accepted to IEEE Internet Computing. We introduce an architecture that facilitates alignment of LMs to specific values, norms and regulations within a context.
Paper →