The New Software: CLI, Skills & Vertical Models

In the era of agent experience, performance will become the new competitive advantage for great SaaS companies

Apr 10, 2026

In January 2025, we first started debating the “death” of software. Anthropic had just open-sourced the Model Context Protocol and it was taking off. Satya Nadella predicted all point and click software (“crud databases”) would get replaced by agents that “own the business logic”. We also assumed that value would accrue to applied AI companies that built great agent harnesses on top of existing frontier models.

It’s now April 2026 and the verdict on this prediction is - well, close but not quiet.

The era of Agent Experience: human users are disintermediating themselves

By December 2025, it became clear that all software needs to be rebuilt for agentic users. Machine identities now outnumber human users by 45 to 1 in the average enterprise, with some organizations seeing ratios as high as 100 to 1. Neon reported 80% of their databases were being created by AI agents, not humans. GitHub sees over 5% of all commits completely authored by Claude Code and perhaps as high as 40% being AI-assisted in some way. The MCP registry crossed 2,000 verified servers with 97 million monthly SDK downloads.

You now have a new product problem. And the companies solving it correctly are not the ones who bolted a chatbot onto their dashboard and are still shipping agents for human users.

Agents operate programmatically, through APIs, scripts, and structured commands, bypassing interfaces entirely. They do not navigate dashboards. They do not click buttons. A well-configured agent reads structured inputs, calls tools, produces structured outputs. The human is not in every loop. In many loops, the human is not present at all.

Welcome to the era of Agent Experience.

This month is crystallizing the past year for us

Anthropic just published their Managed Agents architecture.

The headline is technical: decoupling the “brain” (Claude and its harness) from the “hands” (sandboxes and tools) from the “session” (the durable event log). The implication for SaaS: over time you will delegate your agent architecture to the frontier lab. Expose stable interfaces that can accommodate our models. Needless to say, unless you have truly invested in a great harness that delivers strong outcomes to customers - there goes what you thought was your moat.

Intercom and Zapier are building for agents.

Develoepr focused companies have been doing this for close to a year but now it’s everyone. Zapier’s SDK gives coding agents access to 9,000+ app connectors without requiring API keys or OAuth setup. The integration plumbing was what made Zapier a hit with human user for the past decade. They are now trying to find PMF with our agents instead. The strategy and moat did not change. The consumer did.

Brian Scanlan, from Intercom, announced the Fin CLI fast on the heels of their vertical AI model in customer support (65% end-to-end resolution rates at scale). Agents can now install, configure, and operate Fin without a human touching a UI. The product that was once a chat widget is now invocable from a terminal.

Linear just showed us how easy it is to get this wrong.

In their recent Linear agents release, they built an embedded agent accessible from the desktop app, mobile, Slack, and Teams. It knows your roadmap, your issues, your code. It synthesizes context and takes action.

What they had not built was an MCP server. Or a CLI tool. Or exposed an API. Despite announcing that issue tracking was dead (the right sentiment), they had prioritized the wrong product - their customers were asking for MCP support so external agents could connect to Linear’s data.

If you are still debating “should we build for agents”, you need to immediately shift the conversation to “here is what it actually takes to ship great AX”:

stable interfaces that outlast specific model behavior
capability parity between what humans and agents can do
skills that encode practitioner judgment
a CLI so agents can provision and configure your product
high performance vertical models as open source LLMs catch up

The three patterns making up the new software stack

Skills, CLI tools and vertical models that encode domain knowledge should be a critical part of every SaaS company’s strategy going forward.

1. Skill files: make your domain expertise made machine-readable

A skill file is a markdown document that tells an agent how to use your tool correctly: what to call, in what order, with what constraints, and why. This is the domain expertise SaaS companies spent years accumulating, now expressed in a format an agent can read and act on without a human translating.

Figma launched Skills alongside their MCP server in March 2026. The files encode design system conventions, component naming, token structure: the things a senior Figma practitioner knows that a generic agent would get wrong.

The skill file is where institutional knowledge lives now. Not in the UI. Not in onboarding flows. Not in the help center. In a markdown file an agent reads before it starts working.

PostHog’s team learned this the hard way. They rebuilt their agent architecture twice, they now write skills like employee onboarding for a highly qualified hire. For example: telling agents to always use $pageview as the default activation event, not signed_in, because infrequent events skew retention curves. An agent without that context would produce misleading data and the user would never know why.

2. CLI tools and MCP servers: the new interface layer

The companies that understood the shift earliest rebuilt their interaction model as a CLI, not a GUI redesign.

37signals rebuilt Basecamp as a fully agent-accessible product: revamped API, brand-new CLI, structured JSON output, shell completion. DHH’s framing was the most honest in the industry:

“Agents have emerged as the killer app for AI. So while we keep cooking on actually-useful native AI features, we’re launching a fully agent-accessible version today.”

Google launched Gemini CLI extensions with over a million developers on the CLI in three months, shipping integrations with Figma, Stripe, Shopify, and Snyk. Each extension includes a built-in “playbook” that teaches the AI how to use the new tools.

Vercel’s AI SDK crossed 20 million monthly downloads, built from the start around agentic pipelines.

A CLI is not a regression to developer tooling from the 90s - it is good agent experience. It’s the interface your coding agents love. A command that accepts structured input and produces structured output is composable in ways a GUI never can be. An agent can call it, pipe its output into another tool, chain it into a workflow, retry on failure. Every major AI coding tool (Claude Code, GitHub Copilot CLI, Cursor) operates through the command line for exactly this reason.

3. Vertical models: domain expertise baked into the weights

The third pattern is the one most under appreciated to date. It is also still being debated thanks to the jagged frontier of AI.

Vertical models are not general LLMs with good prompts. They are models fine-tuned on domain-specific data (case law, clinical documentation, customer support transcripts, financial filings) that outperform general models on their specific turf. The domain expertise is not in a skill file sitting on top of a generic model. It is in the weights. They should be faster and cheaper.

Intercom is the most instructive example, with a custom retrieval model (fin-cx-retrieval) specifically engineered for customer service reasoning.

Last month, Cursor launched Composer 2, a proprietary coding model built on Moonshot AI's Kimi K2.5 with Cursor's own continued pre-training and reinforcement learning. It scores 61.7% on Terminal-Bench 2.0, beating Claude Opus 4.6 (58.0%), at $0.50 per million input tokens. One-tenth the price of Anthropic's flagship. They use frontier models for the hardest reasoning tasks. They outsource everything else to custom vertical models that are faster, cheaper, and better on the specific job.

And then there is Harvey, which tells a more complicated story than anyone expected.

When Harvey partnered with OpenAI to build a custom-trained case law model, lawyers preferred it over GPT-4 97% of the time. The vertical model was the product, and the product was growing fast: $190 million ARR by January 2026, $11 billion valuation by March, the majority of the AmLaw 100 as clients.

Then Harvey scrapped the model.

Frontier reasoning models from Google, xAI, OpenAI, and Anthropic started outperforming Harvey’s custom legal model on its own BigLaw Bench evaluation. The moat Harvey had built in the weights evaporated as the baseline improved. Harvey now routes tasks across Claude, Gemini, and GPT via a Model Selector.

This is where we are on the vertical model thesis. Fine-tuning wins decisively in domains where query patterns are genuinely specialized and underrepresented in general training data, where the consequence of errors is high, and where the company has enough distribution to generate meaningful proprietary feedback. Intercom’s fin-cx-retrieval works because customer service reasoning is structurally different from general language tasks, and 40+ million resolved conversations have compounded that advantage.

But for many categories, the better bet is still exceptional workflow infrastructure, skill files, and agentic orchestration built on top of frontier models, rather than a fine-tuned model that requires sustained investment to stay ahead of a baseline that keeps moving.

It’s unclear how long an orchestration advantage can truly last though. The infrastructure for building a domain-specific AI agent with graph-based memory, streaming chat, decision tracing, and SaaS data connectors now takes a single CLI command and five minutes. “We built an AI agent for our domain” is not a defensible position by itself.

The companies that will be most interesting to watch are the ones combining all three layers: a vertical data advantage in the weights for their highest-value queries, skill files that encode workflow expertise for agents using their tool, and CLI/MCP servers that make all of it composable.

Welcome to the new software.

How to win Agent Experience

Agents don’t care what color or shape your buttons are. They care about one thing only - performance. Are you easy to authenticate with? Are you secure? Are you cheaper and faster?

The economics start with a simple observation: most tasks in a production AI system do not require frontier-class reasoning. Contract extraction, data validation, metric computation, format conversion, status checks, retrieval. These are deterministic or near-deterministic operations. A skills architecture routes them to code or small models. A monolithic frontier approach sends every one of them through a $15-per-million-token reasoning engine.

Stanford’s FrugalGPT research demonstrated that cascade routing (sending queries to cheap models first, escalating to expensive ones only when confidence is low) matched GPT-4’s accuracy with up to 98% cost reduction. In production, multi-model routing typically saves 30-60%, with aggressive implementations pushing past 80%.

The latency argument compounds on top of the cost argument. A small model responds in tens of milliseconds. Deterministic code responds in single-digit milliseconds. A frontier reasoning model takes seconds. In an agentic workflow that chains 5-15 tool calls, the difference between “every call hits the big model” and “most calls hit code or a small model” is the difference between a 30-second wait and a 2-second wait. Users notice. Agentic ones too!

When a frontier model owns all business logic, every execution is a probability distribution. Smaller domain-specific models can do the job for 10-20% of total computation. You buy the expensive cognition only where it matters. The counterargument is that frontier model costs keep falling, so the gap closes. But “90% cheaper frontier inference” still loses to “near-zero cost.”

Rebuild your SaaS company for agentic use

SaaS is not being disintermediated the way anyone predicted in January 2025. Humans are taking themselves out of the loop and your GUI should no longer be the first thing that comes to mind when you are building new products. But for companies that have it, the data layer is fine. The workflow logic is fine. The domain expertise is fine, and increasingly the most valuable thing a software company owns, as long as it gets re-encoded in formats agents and models can consume.

If you are working on a new product or new features, stop and ask who your primary user will be in 6 months. And whether you are prioritizing the right features for them.

Narayan Prasath

Apr 10

Model companies can package orchestration, memory, tool use, and managed execution however they want, but the actual harness that matters in practice is still the applied layer around the model. The workflow logic, domain judgment, failure handling, evaluation, permissions, integrations, operator taste, and the way work gets made reliable under real scrutiny and the company that actually owns the outcome. That does not magically disappear because a model company wrapped tools around the model and gave it cleaner language. That layer is still where most of the hard and durable work lives-moat. It is still what applied AI companies build. So when Anthropic frames the model as the brain and the rest as some attachable set of hands, it feels reductionist to the point of silliness. Maybe that framing works for a launch narrative. I do not think it accurately describes where the moat is.

4 replies by Sandhya Hegde and others

4 more comments...

5 Comments

Ready for more?