Kei Nishimura-Gasparian
Machine Alignment, Transparency & Security (MATS) ScholarEarly Signs of Steganographic Capabilities in Frontier LLMs
Talks across the agendas that shape AI Safety today.
Early Signs of Steganographic Capabilities in Frontier LLMs
Thought Anchors: Which LLM Reasoning Steps Matter?
Reverse-engineering a neural network that plans: a mesa-optimizer model organism
Real-Time Detection of Hallucinated Entities in Long-Form Generation
Control Tax: The Price of Keeping AI in Check
The Theoretical Foundations of Reward Learning
AI in a vat: Fundamental limits of efficient world modelling
Safeguarded AI: a scalable workflow for safety-by-construction
Mitigating Goal Misgeneralization via Minimax Regret