News

Updates, announcements, and milestones from my research journey.

2025

award

Best Paper Award at AIES 2025

"When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails" received the Best Paper Award at AIES 2025.

Announcement →
release

Granite Guardian 3.3 released

Granite Guardian 3.3 is here, featuring hybrid thinking mode and a 3rd-place finish on the LLM-AggreFact benchmark.

Model →
talk

Presenting CAST at ICLR 2025

Presenting CAST on conditional activation steering at ICLR in Singapore.

Poster →
award

Granite Guardian tops GuardBench leaderboard

Granite Guardian tops GuardBench, a leaderboard for guardrail models.

Blog →
paper

Granite Guardian accepted at NAACL 2025 (Oral)

Granite Guardian has been accepted to the NAACL '25 Industry Track as an Oral presentation. The latest Granite Guardian 3.1 model now also covers agentic risks.

arXiv →
paper

CAST accepted as spotlight at ICLR 2025

CAST, our work on conditional activation steering, is accepted as a spotlight at ICLR '25.

Paper →

2024

talk

Presenting at NeurIPS 2024 Pluralistic Alignment workshop

Attending NeurIPS '24 to present our work on value alignment at the Pluralistic Alignment workshop.

Workshop →
paper

Granite Guardian preprint on arXiv

Our paper 'Granite Guardian' is now available as a preprint on arXiv.

arXiv →
talk

Presenting at EMNLP 2024

Presenting 'Value Alignment From Unstructured Text' at EMNLP 2024.

Paper →
release

Granite Guardian 3.0 released

Granite Guardian 3.0 is out! It helps detect input and response risks, including various harm and RAG hallucinations.

GitHub →
paper

CAST: Conditional activation steering

CAST, work on conditional activation steering by summer intern Bruce Lee, is now on arXiv.

arXiv →
paper

Alignment Studio accepted to IEEE Internet Computing

Alignment Studio is accepted to IEEE Internet Computing. We introduce an architecture that facilitates alignment of LMs to specific values, norms and regulations within a context.

Paper →