AI · Automation · Engineering

The Fractional AI Forward Deployed Engineer

By Lazar MilicevicJune 26, 20269 min read

$Developer workstation with code on screen representing a fractional AI forward deployed engineer at work$

Last month I shipped a RAG pipeline for one company on Tuesday, debugged a flaky agent loop for another on Wednesday, and spent Thursday in a roadmap session with a third about whether they even need agents (they didn't, they needed a better cron job). All three pay me a fraction of a full-time salary. All three are getting senior AI engineering they could not otherwise afford or attract.

This is what a fractional forward deployed engineer actually does. The role is borrowed from OpenAI and Palantir's playbook — an engineer embedded with the customer, writing code against their real problems — but compressed into 1-2 days per week, across multiple SMBs that are AI-curious but not AI-staffed. I want to walk through what the week actually looks like, when it beats a full-time hire, and how to scope it so both sides win.

What "forward deployed" actually means in an SMB context

A forward deployed engineer (FDE) ships code inside the customer's systems, not slide decks about them. The Palantir version pairs an FDE with a deployment to make the platform work against a real workflow. The OpenAI version sends engineers into accounts to build the first agent or RAG app on top of their APIs. The fractional SMB version is the same shape — embed, write code, ship something that runs in production — minus the enterprise overhead and minus the full-time headcount.

In practice, week-to-week, I'm doing four things for each company I work with:

Building the actual system — pipelines, agents, evals, infra — in their repos, their cloud account, their auth.
Reviewing what their existing dev team or contractors ship, because most teams have never deployed an LLM app and the failure modes are non-obvious.
Making architecture calls they don't have the experience to make: vector DB vs hybrid search, agent vs workflow, Claude vs GPT vs open weights, when to fine-tune vs prompt vs retrieve.
Saying no to bad ideas. This is underrated. A senior engineer who can kill a six-week project in a 30-minute conversation pays for themselves the first month.

The unit of work is a shipped system, not hours. I track hours for billing sanity, but nobody is paying me to be online from 9 to 5.

A real week, hour by hour

To make this concrete, here is a representative week from a recent month. Three active clients, roughly 22 billable hours.

Day	Client	Work	Hours
Mon	A (B2B SaaS)	Built ingestion job for new doc source; added pgvector index; reran eval suite	4
Mon	C (services firm)	Code review on their agent tool calling; flagged a retry loop that would burn tokens	1.5
Tue	A	Pairing session with their backend dev on chunking strategy; updated prompt	2
Wed	B (e-commerce)	Architecture call: replaced a planned multi-agent system with a 3-step workflow + LLM-as-judge	2
Wed	B	Wrote the workflow, deployed to Lambda, set up EventBridge schedule	3
Thu	C	Built eval harness for their support classifier; baseline accuracy 71%, target 90%	4
Fri	A	Reviewed week's traces in Langfuse; tuned reranker thresholds; wrote handoff doc	3
Fri	All	Async Slack, Loom updates, next-week planning	2.5

Notice what's not there: meetings about meetings, standups, status reports nobody reads, "syncs" with marketing. The fractional model forces both sides to protect the hours. When you have eight hours a week with someone, you do not waste two of them on a recurring "AI strategy" call.

When fractional beats a full-time hire

A full-time senior AI engineer in the US costs $220-380k all-in, and the good ones are not on the market — they are at Anthropic, OpenAI, or building their own thing. For an SMB with one AI initiative and a 12-month horizon, the math rarely works.

Here is the rough decision I walk founders through:

Go fractional when:

You have 1-3 AI workstreams, not 10.
You don't yet know if your AI bet will work — you need a senior person to de-risk it before you commit headcount.
Your existing engineering team is strong but has no LLM production experience.
You need someone who has shipped this category of system before, not someone who will learn on your dime.
Your timeline to first value is weeks, not quarters.

Hire full-time when:

AI is the product, not a feature.
You have continuous, high-volume work — daily model evaluation, ongoing RLHF-style loops, in-house model training.
You need someone on-call for a real-time production agent serving thousands of users.
The work is dense enough that a senior person will be billable 40+ hours a week for 12+ months.

The honest version: most SMBs I talk to think they're in the second bucket, and after one scoping conversation we both realize they are in the first. They want a custom-trained model; they need a prompt, a retriever, and an eval. They want a multi-agent swarm; they need three Lambdas and a queue. A fractional senior engineer's first deliverable is often telling you the cheaper, smaller, faster version of what you thought you wanted.

How to scope the engagement (the part most people get wrong)

The thing that kills fractional engagements is fuzzy scope. "Help us with AI" is not a scope. Here is the structure that has worked for me across multiple companies:

1. Two-week paid discovery, fixed fee

Not a free call. A real, paid engagement — usually 20-30 hours over two weeks — where I get access to the codebase, the data, the existing prompts, the failed prototypes. I produce a written deliverable: what to build, in what order, with which tools, with cost estimates and a kill-criteria for each phase. About 30% of these end with me recommending the company not hire me, because the project is either too small (do it yourselves with a Claude API call and a cron job) or too large (you need a full-time team).

The discovery output looks roughly like this:

PHASE 1 (weeks 1-3): Retrieval baseline
  - Ingest: 4 sources, ~12k docs
  - Stack: pgvector + BM25 hybrid, RRF fusion
  - Eval: 50 hand-labeled queries, target nDCG@5 > 0.75
  - Kill criteria: if nDCG@5 < 0.6 after 3 iterations, switch to managed retrieval

PHASE 2 (weeks 4-6): Generation + judge
  - Claude Sonnet for generation, LLM-as-judge for eval
  - Target: 85% judge-pass on golden set
  - Kill criteria: if cost per query > $0.15, drop to Haiku + reranker

PHASE 3 (weeks 7-8): Deploy + observability
  - Lambda + API Gateway, Langfuse for tracing
  - Handoff doc + runbook for in-house team

2. Monthly retainer with a defined deliverable per month

Not "8 hours a week of my time." A monthly outcome — phase X completed, metric Y hit, system Z deployed — with the hours as a sanity bound, not the contract. This aligns incentives. If I can hit the deliverable in 20 hours instead of 32, both sides win and we move to the next thing.

Typical retainers I see work well: $8-15k/month for 1-2 days per week, with a 3-month minimum and 30-day rolling thereafter. Below $8k it's not worth the context-switching cost. Above $15k you should probably hire someone.

3. Explicit handoff from day one

The exit is part of the design. Every PR has a written rationale. Every architectural choice is documented in the repo, not in my head. Every system has a runbook with: how to monitor it, what the alerts mean, how to roll back, who to call when it breaks (eventually that's you, not me). If a fractional engineer is irreplaceable after six months, they have failed at the job.

The trade-offs nobody tells you about

Fractional is not free lunch. Real friction I've hit:

Context switching has a real cost. Three clients means three codebases, three Slack workspaces, three sets of conventions. I keep a per-client CONTEXT.md in each repo with the current state, open questions, and what I was last thinking. Without it, the first 30 minutes of every session are wasted reloading context. With it, I'm productive in five.

You are not on call. If their production agent breaks at 2am, that is not the fractional engineer's problem unless you've explicitly scoped it (and priced it). Most SMBs are fine with this because their AI workloads are internal or batch, not customer-facing real-time. If yours is real-time and revenue-critical, you need full-time ownership.

The team has to be able to ship without you four days a week. A fractional engineer multiplies a competent team. They do not replace a missing one. If you have no engineers and need someone to build and maintain the whole stack, you don't need fractional — you need an agency or a contractor on retainer.

Confidentiality and overlap matters. I will not take two clients in the same niche competing for the same customers. That has to be a hard rule, written in the contract. I also won't reuse client-specific code or data; the patterns I carry across engagements are architectural, not proprietary.

What I'd do if I were the founder

If you are a founder or head of engineering considering this:

Run a paid 2-week discovery before any long engagement. With anyone. Including me. If they push back on this, that's the signal.
Define success in shipped systems, not hours. "Deploy a working RAG system over our docs with >80% answer accuracy on a labeled eval set" is a scope. "Help us with AI" is not.
Ask for the kill criteria. A senior engineer should be able to tell you, before starting, what would make them recommend pulling the plug. If they can't, they haven't thought hard enough about it.
Insist on the runbook. Day one. Not at the end. The runbook is the artifact that proves the system is real and not a demo.
Don't hire fractional to avoid the AI decision. If you actually need a full team, fractional is a delay, not a solution. Be honest about which bucket you're in.

The companies that get the most out of a fractional FDE are the ones who treat it as a senior engineer rental for a specific, scoped bet — not as a way to dabble in AI without committing. The dabbling never produces anything.

If this maps to where you are — AI-curious, decent engineering team, one or two bets you want to de-risk with someone who has shipped this category of system before — I'm at lazar-milicevic.com/#contact. And if you'd rather read more on how I think about these systems before reaching out, the blog has more in this vein.

Frequently asked questions

What does a fractional AI forward deployed engineer actually do?

A fractional AI forward deployed engineer (FDE) embeds with a small or mid-sized company 1-2 days per week to ship production AI systems inside their codebase, cloud, and auth — not slide decks. The role borrows from Palantir and OpenAI's FDE playbook: build pipelines, agents, evals, and infra directly against real customer problems. In practice I split my time across building systems, reviewing the existing team's LLM code, making architecture calls they lack the experience to make, and killing bad ideas before they consume six weeks of engineering. The unit of work is a shipped system, not hours logged.

When should an SMB hire a fractional AI engineer instead of a full-time one?

Go fractional when you have 1-3 AI workstreams, you're still de-risking whether AI will work for your business, your engineering team is strong but has no LLM production experience, and your timeline to first value is weeks rather than quarters. Hire full-time when AI is the product itself, when you have continuous high-volume work like ongoing model training or evaluation, or when you need on-call coverage for a real-time agent serving thousands of users. A US senior AI engineer costs $220-380k all-in and the best ones aren't on the market, so for most SMBs with a single AI initiative the math favors fractional. Most companies I talk to think they need a full-time hire but actually need a prompt, a retriever, and an eval.

How should you scope a fractional AI engineering engagement?

Start with a paid two-week discovery — typically 20-30 hours at a fixed fee — where the engineer gets real access to your codebase, data, existing prompts, and failed prototypes. The deliverable is a written plan: what to build, in what order, with which tools, with cost estimates and explicit kill-criteria for each phase. 'Help us with AI' is not a scope and will kill the engagement; you need concrete shipped systems as milestones. About 30% of my discoveries end with me recommending the company not hire me, because the project is either small enough to do in-house or large enough to require a full-time team.

What does a typical week look like for a fractional AI engineer working with multiple clients?

A representative week for me is roughly 22 billable hours across three clients, doing concrete shipping work: building ingestion jobs and pgvector indexes, code-reviewing agent tool calling for token-burning retry loops, replacing planned multi-agent systems with simpler 3-step workflows, building eval harnesses, and tuning reranker thresholds based on Langfuse traces. What's deliberately absent: standups, status reports, recurring 'AI strategy' syncs, and meetings about meetings. When you only have eight hours a week with someone, both sides protect those hours fiercely. Async Slack updates and Loom videos replace most meetings.

Why do most SMBs not actually need agents or custom-trained models?

In scoping conversations, most SMBs describe wanting a multi-agent swarm or a custom fine-tuned model, but what they actually need is dramatically simpler — usually a prompt, a retriever with good chunking, an eval harness, and maybe three Lambdas behind a queue. Agents introduce non-obvious failure modes like runaway tool-calling loops, and fine-tuning is rarely the right first move when prompting and retrieval haven't been exhausted. A senior engineer's most valuable early contribution is often telling you the cheaper, smaller, faster version of what you thought you wanted — sometimes that's just a better cron job instead of an agent. Killing a six-week project in a 30-minute conversation pays for the engagement in the first month.

Lazar Milicevic

Senior Technical Engineer. I build AI automation, GenAI/LLM systems and cloud architecture — autonomous systems that run while you sleep. Founder of BizFlowAI.

Work with me →

Building something hard with AI or automation? I am open to talk.

Get in touch

← All posts