What Is an AI Agent? A Technical Primer

A precise definition

An AI agent is a software system that takes a goal, plans a sequence of steps to achieve it, executes those steps using tools and data, observes the results, and iterates — without a human in the loop on every step. A chatbot answers; an agent decides and acts.

That phrase — decides and acts — does most of the work. Earlier LLM products responded to prompts. An agent owns a piece of operational control flow: it chooses what to do next, calls APIs and queries data, and only stops when the goal is achieved or it has decided it cannot get there.

Three properties tend to separate real agents from systems that merely look agentic:

Goal-driven. The system is given an outcome, not a prompt. The control flow inside the system decides how to reach the outcome.
Tool-using. The system can invoke external tools — APIs, search, code execution, database queries — rather than relying only on what's in its weights.
Looping. The system observes the result of each action, updates its plan, and decides what to do next, until it terminates.

A system with all three is an agent in the technical sense. A system with only one or two is something simpler — a retrieval-augmented assistant, a tool-using copilot — and there is nothing wrong with that, but it is worth being precise.

The components that recur

Operationally, almost every production agent is built from a small set of repeating components.

A policy decides what the agent does next given the current state. A planner decomposes a goal into subgoals. A toolset is the menu of actions the agent can take. A memory holds state across steps. A retrieval layer grounds decisions in real documents and structured data. An evaluator checks each output before it is used. A handover surface is the explicit point at which the agent stops and asks a human to confirm, override, or take over.

A working agent is a small graph of these components, each tuned to a narrow job. The architectural decisions — which step is deterministic, which is a model call, where the human sits in the loop, what happens when a tool fails — are where most of the engineering effort actually goes.

Three generations of LLM systems

Generation 1 — prompt-and-respond. A model takes a prompt and returns text. No memory, no tools, no actions. Useful for drafting; useless for anything operational.

Generation 2 — retrieval and tool-use. The model is wrapped with retrieval and tool-calls. This is what most "AI assistant" products actually are. Feels agentic, but the control flow is still a single forward pass.

Generation 3 — goal-driven agents. The system is given an outcome, not a prompt. Control flow is a loop: observe → plan → act → evaluate → repeat. The model is one component inside a larger control system, not the system itself.

The third generation is where AI finally touches the operational core of a business — because it can run a workflow end-to-end instead of just helping a human run it.

The questions that come before the model

A common failure mode in agent projects is to start from the model. The right starting point is the workflow.

What is the decision the agent is replacing or augmenting? What does the input look like, and how reliable is it? What is the cost of being wrong, and how is wrong detected? Where does the agent stop and the human start? What does "good" look like, in numbers?

Only after these questions have answers does the model selection conversation begin — and by that point, the answer is usually obvious.

A minimal control loop

state = initial_state(goal)
while not done(state):
    plan   = planner(state)
    action = policy(state, plan)
    result = tool_executor(action)
    state  = update(state, action, result)
    if evaluator(state).should_handover():
        return handover(state)
return finalise(state)

Real systems are messier — caches, retries, parallelism, multi-agent orchestration, prompt-injection guards, cost ceilings, time budgets — but every one of those features attaches to this loop.

What agents are not

Agents are not a model, not a chatbot, and not a replacement for software engineering. They are software, and they drift, regress, and fail in new ways. The same testing, monitoring, audit, and operational rigour that applies to any production system applies here, with extra care because the active component is probabilistic.

The honest one-line description of a production agent is: a goal-driven control loop, with an LLM somewhere inside it, surrounded by the engineering it takes to make a probabilistic system behave like a reliable one.

Where agents earn their keep

Agents earn their keep when the task is bounded, the work is informational, a human is willing and available to review the output, and the cost of iteration is low enough that the agent can try, fail, and try again without doing damage.

When those conditions hold, an agent can take a meaningful slice of work off a human and free that human for the part of the job that genuinely needs judgement.