Inkit Padhi
ML & NLP Researcher @ IBM Research, New York

Hi! My name is Inkit. I’m a researcher interested in fields of Machine Learning (ML) and Natural Language Processing (NLP). My current work centers around improving the safety, reliability, and trustworthiness of Large Language Models (LLMs).
Currently, my primary focus and interests lie in safety and alignment, leveraging synthetic data, developing effective steering strategies, and unlocking advanced reasoning abilities in LLMs.
My past research has encompassed various areas of deep learning, including but not limited to, learning representations for diverse modalities, counterfactual generation, interpretability, text style transfer, unsupervised learning, influence-based attribution and more. I began my research journey at USC/ISI under the guidance of Kevin Knight; our work on probing in sequence models established a foundational contribution in the field of interpretability.
Email: $first_name.$last_name@gmail.com
Updates
Feb, 2025 | Granite Guardian has been accepted to the NAACL ‘25 Industry Track (Oral)! The latest Granite Guardian 3.1 model now also covers agentic risks. |
---|---|
Jan, 2025 | CAST, work on conditional activation steering, is accepted as a spotlight at ICLR ‘25. [Code] |
Dec, 2024 | Attending NeurIPS ‘24 to present our work at the Pluralistic Alignment workshop. |
Dec, 2024 | Our paper “Granite Guardian” is now available as a preprint on arXiv. |
Nov, 2024 | I’ll be presenting our work “Value Alignment From Unstructured Text” at EMNLP 2024 |
Oct, 2024 | Granite Guardian 3.0 is out! It helps detect input and response risks, including various harm and RAG hallucinations. |
Sep, 2024 | CAST: Checkout my exceptional summer intern, Bruce Lee’s, work on conditional activation steering. |
Aug, 2024 | Alignment Studio is accepted to IEEE Internet Computing! We introduce an architecture that facilitates alignment of LMs to specific values, norms and regulations within a context. |