Seminarios

Conversaciones con investigadores destacados.

Charlas sobre las agendas que dan forma a AI Safety.

Machine Alignment, Transparency & Security (MATS) Scholar

Early Signs of Steganographic Capabilities in Frontier LLMs

Anthropic Research Fellows

Thought Anchors: Which LLM Reasoning Steps Matter?

Independiente

Reverse-engineering a neural network that plans: a mesa-optimizer model organism

Anthropic

Real-Time Detection of Hallucinated Entities in Long-Form Generation

EPFL · Anthropic Research Fellow

Control Tax: The Price of Keeping AI in Check

Deducto

The Theoretical Foundations of Reward Learning

University of Sussex

AI in a vat: Fundamental limits of efficient world modelling

ARIA, Safeguarded AI

Safeguarded AI: a scalable workflow for safety-by-construction

University of Oxford

Mitigating Goal Misgeneralization via Minimax Regret