News
Feb, 2025 | Granite Guardian has been accepted to the NAACL ‘25 Industry Track (Oral)! The latest Granite Guardian 3.1 model now also covers agentic risks. |
---|---|
Jan, 2025 | CAST, work on conditional activation steering, is accepted as a spotlight at ICLR ‘25. [Code] |
Dec, 2024 | Attending NeurIPS ‘24 to present our work at the Pluralistic Alignment workshop. |
Dec, 2024 | Our paper “Granite Guardian” is now available as a preprint on arXiv. |
Nov, 2024 | I’ll be presenting our work “Value Alignment From Unstructured Text” at EMNLP 2024 |
Oct, 2024 | Granite Guardian 3.0 is out! It helps detect input and response risks, including various harm and RAG hallucinations. |
Sep, 2024 | CAST: Checkout my exceptional summer intern, Bruce Lee’s, work on conditional activation steering. |
Aug, 2024 | Alignment Studio is accepted to IEEE Internet Computing! We introduce an architecture that facilitates alignment of LMs to specific values, norms and regulations within a context. |