AI · Automation · Engineering

What Is an AI Agent? A Working Engineer's Definition for 2026

By Lazar MilicevicJuly 5, 202610 min read

Software engineer's workstation with multiple monitors showing code, representing AI agent development in 2026

An AI agent is an LLM equipped with tools, memory, and a control loop that lets it decide what to do next without a human typing every instruction. It is not a chatbot, not a prompt template, and definitely not a single API call. The defining characteristic is autonomy: the agent observes its environment, reasons about the next step, takes an action, checks the result, and repeats until the task is done or it gets stuck.

I have spent the last 10 years building automation systems. For the last two of those years, I have been building autonomous AI agents in production, running systems that research, write, optimize, and publish content across multiple sites on their own through my project BizFlowAI. This is my working definition of what an agent actually is, grounded in real architecture decisions rather than the vague conceptual descriptions you find in most articles.

The Perception-Action Loop Is the Core

The core distinction between an agent and a simple LLM call is the control loop. A standard LLM interaction is a one-shot function: you send a prompt, the model responds, you move on. An agent operates differently. It runs inside a loop where the output of one step becomes the input to the next, and the model itself decides whether the task is complete or requires another iteration.

Here is the simplified version of the loop I run in production:

def agent_loop(task: str, max_iterations: int = 10):
    messages = build_initial_context(task)
    
    for i in range(max_iterations):
        response = llm.create(
            model="claude-sonnet-4-20250514",
            messages=messages,
            tools=get_available_tools()
        )
        
        if response.stop_reason == "end_turn":
            return response.content  # Agent decided it's done
        
        if response.stop_reason == "tool_use":
            tool_results = execute_tools(response.tool_calls)
            messages.append(response)
            messages.append(tool_results)
            continue
    
    return hit_iteration_limit(messages)

This is the actual pattern. The LLM gets a system prompt, context, and a list of tools. It responds. If it calls a tool, you execute that tool, append the result to the conversation, and loop back. If it says "I'm done," you stop. That is the entire foundation. The complexity lives in what tools you provide, how you manage context, and how you handle failures.

The max_iterations parameter matters more than people think. I set it to 10 by default. If an agent cannot complete a task in 10 steps, it is almost always stuck in a loop, calling the same tool with slightly different arguments and burning tokens. Better to halt, log the state, and investigate.

Where an Agent Ends and a Prompt Chain Begins

A prompt chain runs a fixed sequence of LLM calls: step one generates an outline, step two writes the body, step three edits. Each step is predetermined by you, the developer. An agent, by contrast, decides its own path through the work. It might call the search tool three times before writing anything, or it might skip search entirely if the context is sufficient.

The line is simple. If the system decides which step comes next based on intermediate results, it is an agent. If you hardcoded the sequence, it is a pipeline.

Both are legitimate. In my content system, I use a hybrid. The overall workflow is a scheduled pipeline: research phase, then generation phase, then publishing phase. But inside the research phase, an agent decides which sources to query, how many queries to run, and when it has enough material. The pipeline handles orchestration between phases. The agent handles decision-making within a phase.

This split is what most production systems look like in 2026. Pure agents that figure out everything from scratch are expensive and slow. Pure pipelines are rigid and break when reality does not match your assumptions. The sweet spot is an agent embedded at the decision points where flexibility actually matters.

Tool Orchestration: The Actual Interface Between LLM and Reality

Tools are how agents interact with the world, and they are the single most important architectural decision you will make. The LLM does not execute code. It emits a structured request describing what tool it wants to call and with what arguments. Your infrastructure executes that tool and returns the result.

The tool schema is your real API contract. A poorly described tool gets called at the wrong time, with wrong arguments, or not at all. Here is a real example from my system:

search_tool = {
    "name": "web_search",
    "description": "Search the web for current information. "
                    "Use when you need facts, data, or recent events "
                    "that are not in your context. Do NOT use for "
                    "topics you already have sufficient information on.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Specific search query, 3-8 words."
            },
            "depth": {
                "type": "string",
                "enum": ["quick", "thorough"],
                "description": "quick = top 3 results, thorough = top 10"
            }
        },
        "required": ["query"]
    }
}

Three things matter in that schema. First, the description tells the model when to use it and when not to. Second, the depth parameter lets the agent control cost. Third, the query description constrains the format. Without that constraint, models will pass entire paragraphs as a query and get garbage back.

The biggest mistake I see in agent development is giving the LLM too many tools. I tested this directly. With 4 tools, the model picks the right one roughly 95% of the time in my tasks. With 12 tools, accuracy drops below 70%. The model starts confusing similar tools, calling search when it should call a database lookup, or calling a formatting tool when it should be reasoning.

The fix is tool routing. Instead of presenting all tools at once, you use a lightweight first pass to determine which category of tools is relevant, then expose only those. I wrote about this in detail in my post on cutting agent token use, and the result was a 99% reduction in token consumption during tool selection. The principle is simple: fewer choices, better decisions.

Memory: What the Agent Remembers Between Steps

Memory in agent systems has nothing to do with the model's context window being large enough to hold everything. Memory is about what you intentionally persist across steps, across conversations, and across runs.

There are three layers I implement:

Layer	Purpose	Storage	Lifetime
Working memory	Current task context, intermediate results	In-memory message list	Single agent run
Session memory	User preferences, recent history	PostgreSQL or Redis	Days to weeks
Long-term memory	Patterns, corrections, learned facts	pgvector with metadata	Permanent

Working memory is just the message array in the loop above. It grows as the agent works, and you truncate or summarize it when it gets too long. Session memory persists between runs so the agent does not ask the same questions repeatedly. Long-term memory is where the system actually learns.

In my self-learning content loop, long-term memory is the difference between a system that repeats itself and one that improves. Every published piece gets scored on real search performance after a few weeks. Those scores feed back into the memory store as structured observations: "listicles with numbered headlines outperformed question headlines by 40% on this topic." The agent retrieves those observations via semantic search before generating new content, and the output gets better over time because it is working from real data.

The retrieval mechanism matters. I use hybrid search, combining BM25 keyword matching with pgvector semantic similarity, fused using Reciprocal Rank Fusion (RRF). Pure semantic search misses exact keyword matches. Pure keyword search misses conceptual overlap. Hybrid gets you both, and in production RAG systems it consistently produces the most relevant context.

Multi-Agent Coordination: When One Agent Is Not Enough

A single agent with 15 tools and a massive context window will eventually hit a wall. It loses track of what it is doing, calls the wrong tools, and degrades in reasoning quality. Multi-agent architecture solves this by splitting the work into focused agents with narrow scopes.

The pattern I use is a coordinator-worker model. One agent acts as the coordinator. It receives the task, breaks it into subtasks, and delegates each subtask to a specialist agent. The specialist does its work, returns a result, and the coordinator integrates everything.

In my content system, this looks like:

Research agent: Gets a topic, runs web searches, reads sources, returns a structured research brief. Tools: web search, URL fetch, content extraction.
Strategy agent: Takes the research brief and existing performance data, decides on angle, target queries, and structure. Tools: analytics query, keyword database.
Writer agent: Takes the strategy and research, produces a draft. Tools: template retrieval, style guide lookup.
Optimization agent: Takes the draft, runs SEO and AEO checks, returns specific improvements. Tools: heading analyzer, schema generator, internal link finder.

Each agent has 3 to 5 tools, a focused system prompt under 500 tokens, and a clear input-output contract. The coordinator manages the handoffs and handles failures.

The critical implementation detail is isolation. Each agent gets its own conversation history. I do not dump the research agent's full output into the writer agent's context. I pass a structured summary. This keeps each agent's context lean and focused on its specific task.

The Autonomy Dial: How Much Freedom to Give

Full autonomy is a spectrum, and more is not always better. The Morgan Stanley lesson I have written about before is relevant here: their AI assistant succeeded precisely because it limited autonomy. It did not try to do everything. It handled specific, well-defined tasks and deferred to the human for everything else.

I configure autonomy based on the cost of being wrong.

Low stakes (content drafting, internal tooling): Full autonomy. The agent runs, produces output, and a human reviews before anything goes live. If the agent makes a mistake, the human catches it. Cost of failure is low.

Medium stakes (customer-facing content, automated reports): Guided autonomy. The agent runs but stops at predefined checkpoints for approval. It might draft and optimize a piece, but a human reviews before publishing. This is where most business applications live.

High stakes (financial decisions, actions that affect customers directly): Human-in-the-loop for every non-trivial action. The agent can prepare and recommend, but a human presses the button.

The temptation in agent development is to push for maximum autonomy because it is technically impressive. In production, the right question is: what is the blast radius if this agent makes a bad decision? Match the autonomy level to the answer.

What I Would Do

If you are building an agent system in 2026, start with one agent, one task, and three to five tools. Get the perception-action loop working reliably. Add memory only when you have a concrete use case for it, not because it sounds sophisticated. Move to multi-agent only when a single agent's context and tool count are genuinely causing performance problems.

Invest heavily in your tool descriptions. They are the API contract between natural language and deterministic code. A vague description produces unpredictable behavior. A precise description with examples and constraints produces reliable tool selection.

Build the failure paths first. What happens when a tool times out? What happens when the LLM returns malformed JSON? What happens when the agent loops five times without making progress? These are not edge cases. They are the default state of production agent systems, and the quality of your error handling determines whether your system runs unattended or needs constant babysitting.

The systems I have built through BizFlowAI run without human intervention for days at a time. They do that not because the LLM is brilliant, but because the infrastructure around the LLM is designed for the reality that LLMs are unpredictable, and that unpredictability is the normal operating condition.

An AI agent is a control loop around an LLM that gives it tools, memory, and the autonomy to decide its next step. Everything else is implementation detail, and the implementation details are where the real engineering happens. If you are working on an agent system and want to talk architecture, or if you need someone to own a production AI build end to end, reach out at lazar-milicevic.com/#contact or check out the rest of the blog for more posts on what this work actually looks like.

Frequently asked questions

What is an AI agent in simple terms?

An AI agent is a large language model equipped with tools, memory, and a control loop that allows it to autonomously decide what to do next without a human typing every instruction. Unlike a chatbot or a single API call, an agent observes its environment, reasons about the next step, takes an action, checks the result, and repeats until the task is complete. The defining characteristic is autonomy: the system itself decides which step comes next based on intermediate results. I have spent the last two years building exactly this kind of system in production through my project BizFlowAI.

What is the difference between an AI agent and a prompt chain?

A prompt chain runs a fixed, predetermined sequence of LLM calls, for example, step one generates an outline, step two writes the body, step three edits. An agent, by contrast, decides its own path through the work based on intermediate results, choosing which tools to call and when. The simple line I draw: if the system decides which step comes next, it is an agent; if you hardcoded the sequence, it is a pipeline. In practice, most production systems in 2026 use a hybrid, pipelines for orchestration between phases, and agents for decision-making within phases.

How does an AI agent control loop work?

The control loop is the core mechanism that makes an agent function: the LLM receives a system prompt, context, and a list of tools, then it responds. If it calls a tool, your infrastructure executes that tool, appends the result to the conversation, and loops back to the LLM. If the LLM signals it is done, the loop stops. I cap iterations at 10 by default in my production systems because if an agent cannot complete a task in 10 steps, it is almost always stuck in a loop burning tokens, better to halt and investigate.

How many tools should you give an AI agent?

Based on my direct testing in production, the number of tools you provide an agent has a dramatic impact on accuracy. With 4 tools, the model picks the right one roughly 95% of the time. With 12 tools, accuracy drops below 70% as the model starts confusing similar tools and calling the wrong ones. The tool schema itself is your real API contract, the description must tell the model when to use the tool and when not to, parameters should let the agent control cost, and input constraints prevent the model from passing entire paragraphs as a search query. Giving an LLM too many tools is the single biggest mistake I see in agent development.

Do AI agents actually execute code themselves?

No, the LLM does not execute code directly. An AI agent emits a structured request describing which tool it wants to call and with what arguments, it is your infrastructure that actually executes the tool and returns the result to the conversation. Tools are the interface between the LLM and reality, making them the single most important architectural decision in agent design. A well-described tool schema acts as the real API contract, telling the model when to use the tool, what format arguments should take, and what parameters are available for controlling behavior.

Lazar Milićević

Senior Technical Engineer. I build AI automation, GenAI/LLM systems and cloud architecture — autonomous systems that run while you sleep. Founder of BizFlowAI.

Work with me →

Building something hard with AI or automation? I am open to talk.

Get in touch

← All posts