Seminarios
AISAR 2025
- Kei Nishimura-Gasparian
Early Signs of Steganographic Capabilities in Frontier LLMs
- Paul Bogdan & Uzay Macar
Thought Anchors: Which LLM Reasoning Steps Matter?
- Adrià Garriga-Alonso
Reverse-engineering a neural network that plans: a mesa-optimizer model organism
- Oscar Balcells Obeso
Real-Time Detection of Hallucinated Entities in Long-Form Generation
- Mikhail Terekhov
Control Tax: The Price of Keeping AI in Check
- Joar Skalse
The Theoretical Foundations of Reward Learning
- Fernando Rosas
AI in a vat: Fundamental limits of efficient world modelling
- Nora Ammann
Safeguarded AI: a scalable workflow for safety-by-construction
- Matthew Farrugia-Roberts
Mitigating Goal Misgeneralization via Minimax Regret