In April 2026, Citi unveiled “Citi Sky,” an AI-powered member of its wealth team that Citigold clients can talk to instead of navigating apps and call queues. The build wasn’t Citi’s alone. Google Cloud’s forward deployed AI engineers worked alongside Citi Wealth and Google DeepMind’s applied team to ship it. A year earlier, A&O Shearman announced it would co-develop agentic legal tools with Harvey and share in the resulting software revenue. And the model itself traces back to Palantir, which spent its first decade embedding engineers inside governments, defense agencies, and large banks, the origin we covered in the first piece in this series.

These are the showcase examples of the forward-deployed model working. Notice what they have in common. Citi is one of the largest banks on earth. A&O Shearman is one of the largest law firms in the world. Palantir’s early customers were national governments. The embedding stories that get written up are, almost without exception, stories about whales.

That points to a question the case studies rarely take on. What happens to everyone below the enterprise tier (the $10M professional-services firm, the regional brokerage, the eighty-person agency) that has the same work-demand problem and a fraction of the budget? The answer is that the model that actually delivers results was not built for them. The reason is arithmetic, and it determines what a mid-market buyer is actually being offered when a vendor shows up promising to “embed.”

The wins all share an address

The pattern in the marquee deployments isn’t a coincidence of which press releases got written. The supply of forward deployed engineers is concentrating at the top of the market.

One vendor analysis, from the AI studio Utsubo, estimated that the FDEs dispatched by the major AI labs cluster heavily in regulated enterprise sectors (financial services, government and defense, healthcare, insurance) even while most open job postings for the role sit at companies with fewer than two hundred employees. The methodology behind that split isn’t disclosed, and the source is a vendor with a service to sell, so treat the exact percentages as directional rather than measured. But the mismatch it describes lines up with everything the named deployments show: the demand for embedded help is broad, and the actual embedding flows to the accounts that can pay for it.

That’s the shape of the problem. The work-demand that the forward-deployed model is best at absorbing — figuring out how an AI system fits a specific business’s real process — exists at every tier. The delivery of it does not.

The reason is arithmetic, not preference

The clearest accounting of why comes from Jason Lemkin, the SaaStr founder, who ran the numbers in August 2025 on whether forward deployed engineers can work for smaller companies at all.

Start with the salary. A capable FDE commands $135K–$200K and up. Spread across the handful of customers one engineer can realistically embed with, Lemkin puts the raw engineering cost at $40K–$67K per customer per year. Add travel, overhead, and any margin, and the all-in figure clears $75K per deployment annually. That’s why even basic embedded engagements tend to start at $25K up front and $60K-plus annual deal sizes. Those numbers presuppose an enterprise budget. A firm doing $10M in revenue, whose entire technology spend might be a fraction of a single FDE’s loaded cost, cannot carry one without distorting everything else it does.

Lemkin’s verdict is the line worth sitting with: “The very businesses that need AI transformation most may be priced out of the implementation model that actually delivers results.” The companies with the least slack — the ones drowning in manual coordination, the ones for whom a well-built workflow would matter most — are the ones the model reaches last, if at all.

And the margin can’t absorb the discount

There’s an obvious objection. If the moat is worth it, why wouldn’t a vendor serve smaller customers at a loss up front to build it? Some do try. The trouble is that the margin structure of AI itself fights the move.

AI companies do not run software margins. ICONIQ’s State of AI survey of roughly 300 software executives put average AI product gross margins at 41% in 2024, 45% in 2025, and a projected 52% in 2026 — against the 80–90% that mature SaaS businesses take for granted. The gap is driven by inference costs, which the same survey found rising from about 20% to 23% of spend as companies scale, not falling. Bessemer’s pricing playbook lands in the same territory, placing AI gross margins at 50–60%. (Both are investor assessments rather than audited figures, but they converge.)

A business already operating at half of SaaS margins has very little room to cross-subsidize unprofitable small deployments out of fat elsewhere, because there is no fat elsewhere. This is why both Lemkin in 2025 and Utsubo in 2026 reach for the same word independently: the mid-market gap is structural, not a temporary capacity issue that scale will eventually fix. The cost floor is load-bearing.

What the mid-market gets instead

The absence of embedded engineers does not mean smaller firms get no AI. It means they get a different delivery model: AI built into the software they already run, rather than people sent to build software around them.

In legal, Clio reported in March 2026 that 86% of mid-sized law firms were using AI — though the sample blends an independent panel with Clio’s own customers, who skew technology-forward, so read that number as a ceiling rather than a census. Either way, the AI those firms use is largely embedded inside the practice-management platform itself, not delivered by an engineer at the firm. In accounting and finance, the same pattern: Intuit and Anthropic partnered in February 2026 to let mid-market businesses build agents on the Intuit platform, with the AI living inside the tools the firm already uses to keep its books.

This model has real advantages worth naming. It’s cheaper. It deploys in days instead of quarters. There’s no travel budget and no headcount risk. For a lot of standard work, that is genuinely the right answer.

The trade is workflow fit. Platform-embedded AI is shaped to the platform’s model of the work, not to how a particular firm actually operates, which is why Clio’s own survey found 30% of mid-sized firms naming difficulty integrating new technology into existing workflows as a key barrier. And the dependency doesn’t disappear; it relocates. Instead of depending on a specific embedded engineer, the firm depends on a specific platform’s roadmap and pricing. That’s lock-in by another name, and it’s the quiet cost of the cheaper path.

The middle that’s been proposed, and not yet proven

A few people have tried to design something between the $75K-floor embedded engineer and the off-the-shelf platform feature.

Lemkin’s own suggestions for firms priced out of FDEs are pre-built solutions for standard workflows, self-implementation using industry templates, and community-driven support, all of which trade bespoke fit for affordability. Utsubo proposes a “forward deployed studio”: a small pod of an engineer, a designer, and a strategist embedded for a few months rather than a single full-time engineer forever, with the per-engagement cost spread thin enough to reach the mid-market. That’s a vendor describing its own service, so it’s a proposal, not an industry norm.

None of these middle models has a documented track record at the mid-market tier yet. The structural gap is well described by two credible, independent observers. What reliably fills it is not. This is the open frontier of the category, and any vendor claiming to have solved it should be asked for evidence, not architecture diagrams.

What a mid-market buyer should actually inspect

If you run a smaller firm and a vendor offers to embed with you, the useful diligence is about separating the real thing from a cheaper substitute wearing its language. Four questions:

Ask whether this is embedding or a platform feature with a salesperson attached. A true forward deployed engagement builds software around your process. A configurable platform feature bends your process to fit the software. Both can be worth buying, but they cost different amounts and create different dependencies, and a vendor blurring the two is selling the cheaper one at the more expensive one’s price.

Ask for the all-in annual cost, and what work it takes off your team. The realistic floor for real embedding is north of $75K per year. If a vendor quotes embedded-engineer outcomes at a platform-feature price, something is being substituted, usually the depth of the fit. Make them name what work actually leaves your team’s plate, in hours.

Ask whether the system fits your workflow or reshapes it. For a lot of standard work, reshaping to fit a good platform is fine. For the work that makes your firm distinctive, it usually isn’t. Know which kind you’re buying for before you sign.

Ask where the review layer sits and what you keep if it ends. Whether you depend on an engineer or a platform, you’ve taken on a dependency. The questions that contain it are the same ones at any tier: where a human checks the system’s output before it reaches a client, and what you retain (documentation, data, a way to operate) if the relationship ends. (On why that review layer has to be a designed operating function rather than a policy line, see our AI governance framework.)

The forward-deployed model points in the right direction. The burden of making AI fit real work should sit with the people who build it, not the buyer. The catch is that the version of the model that delivers on that promise was priced for the enterprise, and the version that reaches the mid-market mostly hands back a tool and asks the firm to do the fitting itself. That gap is structural, and for most companies it’s still unfilled.

Gridex is built for that gap. We run the forward-deployed posture as operated capacity rather than billed engineering hours: we learn the work, build the workflow, and operate it with human review designed into the process, without the enterprise travel budget, the headcount risk, or the platform lock-in. The result is capacity a mid-market team can actually afford and audit. See how we work →