General Agents Contain World Models

A new theorem from Google DeepMind settles a decades-old debate — and reshapes how we should build AI systems.

For decades, AI researchers have been split on a fundamental question: does an intelligent agent need a world model — a learned representation of how the environment transitions from state to state — or is model-free reinforcement learning sufficient?

Rodney Brooks famously argued in “Intelligence Without Representation” that the world is its own best model. Model-free approaches since then have produced remarkable results: Atari-playing DQNs, instruction-following LLMs, robotic control systems — all generalizing without any explicit environmental model.

A new paper from Google DeepMind, accepted at ICML 2025, closes this debate formally. Richens, Abel, Bellot, and Everitt prove that generalization and world modelling are informationally equivalent.

The Setup

The paper formalizes the question precisely. Consider any environment modeled as a controlled Markov process — the standard formalism covering most sequential decision problems. An agent is a goal-conditioned policy: given the current state and a goal, it outputs an action. Goals are expressed as sequences of sub-goals the agent must achieve in order, over multi-step horizons.

Theorem 1 states:

Any agent that satisfies a regret bound for a sufficiently diverse set of simple goal-directed tasks must have learned an accurate predictive model of its environment. The error in the extracted world model decreases as the agent’s performance improves or the horizon of goals it can achieve increases.

In plain terms: if your agent reliably achieves multi-step goals across diverse tasks, its policy encodes the environment’s transition function. You can algorithmically extract a world model directly from the policy — no access to training data, no architectural assumptions required.

Critically, the theorem is architecture-agnostic and training-agnostic. It doesn’t matter if the agent was trained with PPO, RLHF, or distillation. If it generalizes to long-horizon tasks, it has a world model — encoded, implicitly, in its weights.

The Knife Edge

The result has two complementary theorems that together define the boundary with unusual precision.

Theorem 1 — Multi-step agents: Any agent generalizing to depth-n goal sequences must contain a world model. World model error scales as O(δ/√n) + O(1/n), where δ is the agent’s failure rate and n is goal depth. Better performance and longer planning horizons both demand higher-fidelity world models.

Theorem 2 — Myopic agents: Agents that only optimize for immediate outcomes do not need a world model. No procedure can extract environment transition probabilities from a myopic agent’s policy — the bound is provably trivial.

World models are necessary if and only if the agent pursues goals across multiple time steps. Single-step agents can remain genuinely model-free. Everything else cannot.

Why This Matters

1. No model-free shortcut to general AI.

The paper formally closes the theoretical motivation for purely model-free general intelligence. Any agent generalizing across diverse, long-horizon tasks has learned a world model — whether or not we designed it that way. This directly motivates explicitly model-based architectures like DreamerV3, MuZero, and JEPA-style systems.

2. A mechanism for emergent capabilities.

The theorem offers a clean account for why foundation models exhibit surprising generalization. Training on diverse goal-directed tasks implicitly forces world model acquisition — which then transfers to tasks the model was never explicitly trained on. Emergence is not mysterious; it is a consequence of competence.

3. A new handle on safety.

If capable agents necessarily contain extractable world models, safety researchers can probe those models — testing what an agent “believes” about the environment without needing architectural transparency. This is a practical path toward auditing agent behavior in safety-critical deployments.

4. Hard limits on capability.

An agent’s generalization is fundamentally bounded by the fidelity of its world model, which is bounded by the learnability of the environment itself. In high-dimensional, partially observed, non-Markovian real-world domains, this is a meaningful constraint — not just a theoretical footnote.

What Changes for Practitioners

If you are building production AI systems — agents that call tools across multiple steps, coordinate multi-stage workflows, or reason over long clinical or financial horizons — this paper reframes how you should diagnose failure.

An agent that fails at long-horizon tasks is failing because its implicit world model is inaccurate. Not because it lacks “reasoning ability” in some vague sense. The path to better agents is better world models: richer training distributions, longer planning horizons, and architectures that surface world modelling explicitly rather than leaving it buried.

At Kalman AI, this shapes how we approach agent architecture across our work — in precision oncology pipelines where multi-step inference over genomic and imaging data requires a faithful model of biological state transitions, in mutual fund analytics where agents reason across temporal market regimes, and in doctor-patient intelligence where clinical reasoning unfolds over long, branching decision horizons. In each domain, building systems bespoke to the specific dynamics of the environment is not just good engineering. The theorem tells us it is a formal necessity.

The Bigger Picture

The paper sits at the intersection of inverse reinforcement learning, mechanistic interpretability, and planning theory. It completes a triangle:

• Given (world model + goal) → planning produces a policy

• Given (world model + policy) → IRL recovers the goal

• Given (policy + goal) → this result recovers the world model

The question is no longer “does my agent have a world model?” If it generalizes, it does. The question is now: how accurate is that model, and how do we extract and use it?

That is a tractable engineering problem — and the kind of work that separates production-grade AI from capable demos.

Richens, J., Abel, D., Bellot, A., & Everitt, T. (2025). General agents contain world models. Proceedings of ICML 2025. arXiv:2506.01622

Working on something this touches? Talk to the founder.