Seminarios

Kei Nishimura-Gasparian
Early Signs of Steganographic Capabilities in Frontier LLMs
Paul Bogdan & Uzay Macar
Thought Anchors: Which LLM Reasoning Steps Matter?
Adrià Garriga-Alonso
Reverse-engineering a neural network that plans: a mesa-optimizer model organism
Oscar Balcells Obeso
Real-Time Detection of Hallucinated Entities in Long-Form Generation
Mikhail Terekhov
Control Tax: The Price of Keeping AI in Check
Joar Skalse
The Theoretical Foundations of Reward Learning
Fernando Rosas
AI in a vat: Fundamental limits of efficient world modelling
Nora Ammann
Safeguarded AI: a scalable workflow for safety-by-construction
Matthew Farrugia-Roberts
Mitigating Goal Misgeneralization via Minimax Regret