AI · Automation · Engineering

llm CLI vs Aider vs Cursor CLI vs Claude Code: 2026

By Lazar MilicevicJune 26, 202610 min read
Developer workstation with multiple terminals comparing llm CLI, Aider, Cursor CLI and Claude Code in 2026

I run all four of these tools on different machines for different reasons. One sits inside a cron job that summarizes overnight logs. Another writes most of the boilerplate for my Next.js side projects. The fourth is open in a tmux pane right now, refactoring a Lambda handler while I write this. They look similar from the outside — text in, code or text out, terminal-native — but they solve genuinely different problems, and picking the wrong one wastes weeks.

This is the neutral comparison I wish existed when I was deciding what to standardize on for BizFlowAI's internal tooling.

TL;DR: which terminal LLM tool should you use?

If you want the fastest answer: Claude Code for autonomous, multi-file engineering work where you trust the agent to plan and execute. Aider when you want a disciplined pair-programmer that respects git and your review process. Simon Willison's llm CLI for scripting, pipelines, logging, and anything you want to call from bash. Cursor CLI if your team already lives in Cursor and you want the same model routing in the terminal. OpenAI Codex CLI and Gemini CLI are credible alternatives — Codex is excellent for sandboxed execution, Gemini for very long context on cheap tokens.

The honest take: most serious teams end up with two of these, not one. llm for scripts and a coding agent (Claude Code or Aider) for editing.

The six tools, at a glance

I'm comparing six tools that all live in the terminal and all talk to LLMs, but with very different intent:

  1. llm by Simon Willison — a Unix-philosophy CLI for talking to any model.
  2. Aider — a git-aware pair-programmer.
  3. Cursor CLI — Cursor's editor capabilities, headless.
  4. Claude Code — Anthropic's agentic coding CLI (what I'm writing this in).
  5. OpenAI Codex CLI — OpenAI's open-source local coding agent.
  6. Gemini CLI — Google's open-source terminal agent.

14-criteria comparison

Criterion llm Aider Cursor CLI Claude Code Codex CLI Gemini CLI
Primary use case Scripting / pipelines Pair-programming Editor parity in shell Autonomous coding agent Sandboxed coding agent Long-context agent
License Apache 2.0 Apache 2.0 Proprietary Proprietary Apache 2.0 Apache 2.0
Model support Any (plugins) Most major + local Anthropic/OpenAI/etc. Anthropic models OpenAI + others Gemini family
Local LLMs (Ollama) Yes (plugin) Yes (native) Limited No Limited No
Token cost posture* BYO key, pay provider BYO key, pay provider Bundled subscription Bundled or BYO BYO key Generous free tier
Tool use / function calling Via plugins Limited Yes Yes (native) Yes Yes
MCP support Plugin Yes Yes Yes (first-class) Yes Yes
Multi-file edits No Yes (git diff) Yes Yes Yes Yes
Git integration Manual First-class commits Yes Yes (commits + PRs) Yes Yes
Shell execution Pipe in/out Manual confirm Yes Yes (with permission) Sandboxed Yes
Plugin ecosystem Large, mature Modest Closed Skills + MCP Growing Growing
Conversation logging SQLite, built-in Markdown chat log Cloud Local sessions Local Local
Headless / CI use Excellent Good Possible Good (-p flag) Good Good
Learning curve Low Low Low if you know Cursor Medium Medium Low

*Cost posture: don't read this as "X is cheaper than Y." All of them ultimately route to the same model APIs whose per-million-token prices change monthly. Check the model provider's pricing page; what matters more is which model the tool defaults to and how aggressively it re-reads context.

Where each tool actually shines

llm — the Unix tool I reach for in scripts

Simon Willison's llm is the one I use most often, and probably the one most people overlook because it's not a coding agent. It's a CLI that does one thing well: send a prompt to a model, get a response, log it to SQLite. That's it. And because of that, it composes beautifully with everything else in a terminal.

A real example from one of my automation pipelines:

cat overnight-errors.log \
  | llm -m claude-4.5-sonnet -s "Group these errors by root cause. Return JSON." \
  | jq '.groups[]' \
  | llm -m gpt-4o-mini -s "Write a one-line Slack message for each."

I have crons doing variations of this on a small VPS. The plugin ecosystem is genuinely large — there are community plugins for Ollama, embeddings, fragments, templates, and most providers worth using. The SQLite logging is the underrated feature: I can llm logs six months later and see exactly what prompt produced what output, which is invaluable when an automated pipeline starts producing nonsense.

What llm is not: it will not edit your codebase across files, it will not run tools autonomously, it will not commit to git. It's a primitive, and that's the point.

Aider — the disciplined pair-programmer

Aider is what I recommend to engineers who are skeptical of "agentic" coding. It does multi-file edits, but every edit goes through a git diff you review, and every accepted change becomes a commit with a useful message. The model proposes, you dispose.

The thing Aider gets right that newer agents sometimes miss: it asks before adding files to its context window. You build the working set explicitly. This matters because the failure mode of agentic coding tools is silently re-reading half your repo on every turn and burning tokens.

Aider's local-LLM support is genuinely usable. I've run it against a quantized Qwen on a Mac Studio for offline work on a flight and gotten real productivity out of it, which I cannot say for any of the other five.

Cursor CLI — Cursor without the Electron app

If your team is already on Cursor, the CLI is the obvious extension. It gives you the same model routing and policies in a headless context, which matters for CI pipelines and remote dev boxes where you don't want to run an Electron editor. If your team is not on Cursor, the CLI alone is not a strong enough reason to adopt the ecosystem.

Claude Code — the autonomous engineer

Claude Code is what I use when I have a well-scoped task and I want it done without holding the model's hand. It will read files, plan, edit, run tests, commit, and open a PR if you let it. The MCP integration is first-class — I have it wired to a Postgres MCP server for one project so it can query the dev database while debugging, with read-only credentials.

The trade-off is real: Claude Code is aggressive about reading context and running tools. On a large repo, a single non-trivial task can consume meaningful tokens. The fix is the same as with any agent — narrow the scope, point it at specific files, and write a clear plan in the prompt. The -p (print) mode also makes it scriptable for headless runs, which is how I use it inside some of my own automation.

The other trade-off: it's Anthropic-only. If you need to compare outputs across providers in the same workflow, you'll combine it with llm.

OpenAI Codex CLI — the sandboxed option

Codex CLI runs the agent inside a sandbox by default, which is the right posture for code execution. If you're letting an LLM run arbitrary commands on your machine, sandboxing should be table stakes. Codex makes it explicit. The downside is the same as Claude Code's, inverted: you're committing to the OpenAI model family.

Gemini CLI — long context, generous free tier

Gemini CLI is the one I underestimated. Gemini's long-context capability is real, and for tasks like "read this 200-file repo and tell me where the auth flow is broken," it competes well. The free tier (subject to change — check Google's current quotas) makes it a reasonable choice for personal projects and learning.

The dimensions that actually matter in production

After running these in real workflows, the spec-sheet comparisons matter less than four practical questions:

1. How does it handle context?

The single biggest cost lever is what the tool re-reads on every turn. llm reads only what you pipe in — predictable. Aider reads the files in its explicit working set — predictable. Claude Code, Codex, and Cursor agents make context decisions for you, which is faster but harder to reason about cost-wise. On a recent refactor I watched Claude Code re-read the same 4,000-line file three times in one session because each tool call reset its working memory of it. Worth it for the speed, but you have to know it happens.

2. How does it handle git?

Aider is the gold standard here — every accepted change is a small, well-described commit. Claude Code and Codex will commit if you ask but tend to make larger commits unless you instruct otherwise. llm has no git awareness at all; that's your job. For any production codebase, "small, reviewable commits" is the constraint that matters most six months later.

3. How does it handle tools (MCP)?

MCP is the standard worth betting on. Claude Code, Cursor, Codex, and Gemini all support it. llm has it via plugin. Aider's support is narrower. If you're building serious internal tooling — connecting an agent to your database, your ticketing system, your deployment pipeline — MCP support is the criterion that will matter most in two years.

4. Can you run it headless?

llm is born-headless; this is its superpower. Claude Code's -p flag and Aider's --message flag make them scriptable for batch runs. Cursor CLI is improving here. If you want an agent to triage GitHub issues at 3am, this matters.

What I'd actually do

For a solo engineer or small team starting fresh in 2026, this is my honest recommendation:

  1. Install llm today. Use it for any one-shot prompting, summarization, scripted pipeline, or "I need to ask an LLM something from bash" task. It's the lowest-commitment tool on the list and it pays for itself in a week.
  2. Pick one coding agent based on your model preference and your tolerance for autonomy. Aider if you want to review every diff. Claude Code if you want to delegate larger tasks. Codex if you want the sandboxing posture. Don't try to run two coding agents in parallel — the muscle memory conflicts and you end up using neither well.
  3. Wire MCP early. Whichever agent you pick, invest a day connecting it to your real systems through MCP servers. The productivity delta between an agent that can read your dev database and one that can't is enormous, especially for debugging.
  4. Log everything. llm's SQLite log is free observability. For your coding agent, save session transcripts. When the agent makes a confidently wrong change in three months, you'll want the receipt.

The tools will keep moving — features ship monthly, pricing shifts quarterly, new entrants appear. The decision framework above is more durable than any specific table cell.

If you're picking a stack for a serious build, or trying to figure out where agentic coding fits inside a real production workflow, I write about this kind of thing regularly on the blog — and if you want a second pair of eyes on your specific setup, drop me a note.

Frequently asked questions

What is the difference between llm CLI, Aider, Cursor CLI, and Claude Code?

These four tools all run in the terminal and talk to LLMs, but they solve different problems. Claude Code is an autonomous, multi-file coding agent that plans and executes engineering work. Aider is a disciplined git-aware pair-programmer that proposes diffs you review before each commit. Simon Willison's `llm` CLI is a Unix-philosophy primitive for scripting, pipelines, and logging prompts to SQLite. Cursor CLI brings Cursor's editor model routing into a headless shell environment, mainly useful if your team already uses Cursor.

Which terminal LLM tool should I choose for my workflow?

Pick Claude Code when you want an autonomous agent to handle multi-file engineering tasks end-to-end. Choose Aider if you want a careful pair-programmer that respects git and your review process. Use Simon Willison's `llm` CLI for bash scripting, cron jobs, and pipelines. Cursor CLI fits teams already standardized on Cursor, while OpenAI Codex CLI is strong for sandboxed execution and Gemini CLI for long-context work on cheap tokens. In practice most serious teams adopt two tools: `llm` for scripts plus a coding agent like Claude Code or Aider for edits.

Can I use local LLMs like Ollama with these coding CLIs?

Local LLM support varies significantly across these tools. Aider has the strongest native Ollama integration — I've run it against a quantized Qwen on a Mac Studio for offline work and gotten real productivity out of it. The `llm` CLI supports local models through a plugin. Cursor CLI and Codex CLI have only limited local support, and Claude Code and Gemini CLI do not support local LLMs at all since they're tied to their providers' hosted models.

What is Simon Willison's llm CLI actually good for?

The `llm` CLI is a Unix-style primitive that sends a prompt to any model, returns the response, and logs everything to SQLite — and that's deliberately all it does. Because of that minimalism it composes beautifully in bash pipelines with tools like `jq`, `cat`, and cron, which makes it ideal for scripted automation like summarizing logs or routing model outputs between providers. Its SQLite logging is the underrated feature: you can run `llm logs` months later to see exactly what prompt produced what output. It will not edit your codebase across files, run tools autonomously, or commit to git — for that you need an agent like Aider or Claude Code.

Why does Aider work well for engineers skeptical of AI coding agents?

Aider takes a disciplined approach that addresses the main concerns about agentic coding. Every multi-file edit is presented as a git diff you review before it's applied, and every accepted change becomes a commit with a meaningful message, so nothing changes behind your back. Critically, Aider asks before adding files to its context window, forcing you to build the working set explicitly rather than silently re-reading half your repo on every turn. That explicit-context discipline avoids the main failure mode of newer agents — quietly burning tokens and drifting on context — which is why I recommend it to engineers who don't yet trust autonomous coding tools.

Lazar Milicevic

Lazar Milicevic

Senior Technical Engineer. I build AI automation, GenAI/LLM systems and cloud architecture — autonomous systems that run while you sleep. Founder of BizFlowAI.

Building something hard with AI or automation? I am open to talk.

Get in touch

← All posts