AI Is Now the Operating System. Here’s What That Actually Means.

by Seb Matthews | Mar 24, 2026 | AI

It’s Not About the Models

Something shifted in early 2026, and I have been trying to work out why so much of the commentary around it has been either too vague to be useful or too excited to be trusted.

In a few weeks, every major AI provider repositioned its platform. Not with a rebranding exercise, but with actual product changes. Anthropic, OpenAI, Microsoft, Google. More or less simultaneously, all of them moved from offering AI as a tool that sits alongside your existing software to offering AI as infrastructure that sits underneath it. Multi-step execution. Cross-system orchestration. Goal-directed agents with defined autonomy boundaries. Governance baked in.

The markets noticed before most product teams did. Software indices fell sharply in late January and early February 2026, the worst start to the year the sector has seen in recent memory, while broader indices stayed roughly flat. The repricing reflected a specific conclusion: that agentic AI can automate the knowledge work that per-seat software was built to support. If you have been wondering why your software vendor’s stock looks the way it does, that is the answer.

From Copilot to Something That Actually Does Things

For the past two years, the dominant mental model for AI in enterprise software has been the copilot. An assistant who sits next to the human and helps. You write, it suggests. You query, it summarises. You decide, it drafts.

That framing was useful. It lowered the bar for adoption. It gave buyers a comfortable story. It gave builders a tractable scope. But it was always a transitional model. The question was always what it was transitioning to.

The answer becomes clearer when you look honestly at what knowledge work actually consists of. A large proportion of it is not creative judgment or nuanced decision-making. It is structured execution: gathering information from multiple systems, formatting it, routing it, checking it against criteria, escalating exceptions, logging outcomes. The kind of work that, once you describe it clearly enough, maps almost perfectly onto what a well-constructed AI agent can do today.

What the major platforms have now built is the infrastructure to enable that execution at enterprise scale. Agents that can be given a goal, access to the relevant systems, defined limits on what they can do autonomously, and a handoff path when they hit something they cannot handle. That is not a copilot. That is a worker. The distinction matters because it changes what you have to build, what you have to govern, and where the competitive advantage actually sits.

The Competitive Shift That Is Easy to Misread

Bain published research recently that put a number on the productivity shift: 30 to 50 per cent gains in knowledge-work functions from deploying agents at scale. They compared it in magnitude to the offshoring wave of the late 20th century.

The number is fine, but the more interesting part of that comparison is the structural effect. Offshoring did not just make some things cheaper. It changed what you had to be good at to win. Companies that thought their moat was operational complexity found out it was not when that complexity could be exported. The ones that survived reshaped around what could not be moved.

AI does the same thing through a different mechanism. Offshoring moved work to lower-cost labour markets, which meant scale and process maturity were decisive. AI removes the labour arbitrage entirely. The cost of executing a structured knowledge task is converging to near zero, and that is true regardless of your size. Which is the part that should give pause to any business that has been quietly relying on being too big and too complex for a smaller competitor to replicate.

For product teams specifically, the implication is direct: moats built on operational complexity are evaporating. If your defensibility relies on the fact that it takes many trained humans to deliver the outcome, you are exposed. The question is not whether AI will reach your category but when, and whether you are the one reshaping it or someone else.

The emerging advantages are different in character. Proprietary data, but only if it is genuinely unique and not just licensed content that anyone can access. Deep workflow integration, meaning the depth at which your product is embedded in the customer’s actual process, rather than sitting alongside it. And trust, which I will come back to because I think it is being underestimated.

Workflow Redesign Is the Hard Part. Most Teams Are Skipping It.

Here is where I see product and engineering teams go wrong most consistently right now. They find a use case, wire up a model, ship something, and it kind of works. Then it fails to scale. Then it gets quietly deprioritised.

The reason is almost always the same. They automated the existing process rather than redesigning it for agents.

This sounds obvious. It is harder to avoid than it sounds. The existing process is documented, it has stakeholder sign-off, and it has worked well enough for years. The path of least resistance is to take it, replace the human steps with model calls, and call the result AI-powered.

The problem is that most enterprise processes were designed around human constraints. They have approval steps because humans need checkpoints to stay oriented. They have specific handoff formats because humans need context to pick up where someone else left off. They have escalation paths designed around human availability windows. None of those constraints applies to agents, and when you optimise for them anyway, you leave most of the value on the table and accumulate most of the friction.

The better approach is to start from the outcome. What is this process actually trying to produce? Then redesign the steps from scratch with a clear-eyed view of what agents can do today, and what they will probably be able to do in the next six to twelve months. You will often find that you can collapse five steps into two, remove three approval layers that existed only because of human coordination costs, and deliver the outcome in hours rather than days.

That redesign work is unglamorous and requires genuine collaboration with the people who own the process, not just those who are excited about the technology. But it is the determinant of whether your AI deployment moves a metric or just adds a feature to the changelog.

The Agent Factory: What Industrialising This Actually Looks Like

One framework I keep returning to is what Bain describes as the ‘agent factory’: treating the build of AI agents as an industrial process with repeatable inputs, quality standards, and governance built in from the start, rather than a series of one-off projects.

Most product teams are not doing this. Each agent build is treated as its own problem. A new architecture discussion, a new approach to evaluation, a new set of decisions about what the agent is and is not allowed to do. The result is a portfolio of agents that is nearly impossible to maintain coherently as models evolve and requirements shift.

The factory model forces a useful discipline. What does a well-defined agent look like before you start building it? It has a clear trigger condition. It has typed inputs and outputs. It has explicit boundaries on what it can do autonomously and what it must escalate. It has defined performance targets. It has a specified evaluation approach. If you cannot write all of that down before you start building, you are not ready to build.

That last point is one I would hold the line on. The pressure to start building is enormous, and the definition work feels slow when everyone is excited to ship. But agents without clear contracts produce the kind of brittle, hard-to-debug failures that erode confidence in AI systems broadly, and set programmes back far longer than the definition work would have taken.

The governance layer is equally important and equally underinvested in most teams I have seen. Real-time visibility into what agents are doing. Full trace logging. Continuous automated evaluation against defined performance targets. Kill switches for when things go sideways. These are not bureaucratic overhead. They are what allow you to increase agent autonomy over time without accumulating operational risk you cannot see.

Scaling: The Pattern Matters More Than the Technology

Bain identifies six patterns for scaling AI across an organisation, from broad bottom-up experimentation to what they call ‘leapfrog’, in which a small, empowered team rebuilds a core part of the business from scratch.

The instinct in most organisations is to start bottom-up, because it feels low-risk and culturally easy. Let teams experiment. See what sticks. The honest assessment of this approach is that it rarely produces the workflow redesign or governance infrastructure described above. It produces a collection of demos and proofs of concept that never graduate to production. The experimentation is real. The impact is not.

For product teams, the most underused pattern is horizontal scaling: take one well-proven agent or use case and replicate it systematically across similar domains, prioritising value over novelty. Most teams, once they have something working, move immediately to the next interesting problem. The discipline to extract maximum value from a proven pattern before chasing the next one is rare. It is also where a significant amount of compound advantage gets built.

The other pattern worth attention is longitudinal scaling, in which small teams iterate intensively on a high-value use case over time, incrementally increasing agent autonomy as performance improves and trust accumulates. This is the pattern that produces genuinely differentiated AI capability, because the differentiation is in the accumulated evaluation infrastructure and institutional learning, not in any single model choice that a competitor can replicate next quarter.

The Trust Question

As agents become more capable and more autonomous, the question of who gets permission to orchestrate consequential parts of a customer’s workflow is going to matter a great deal. Not every product that can deploy agents will be trusted to do so in high-stakes contexts.

That trust is not primarily a function of technical capability. It is a function of governance, transparency, track record, and the quality of the human oversight you have built into the system. The products that get trusted with the most valuable agent deployments will be the ones that invested early in making their agents explainable, their failures recoverable, and their escalation paths genuinely useful rather than just legally defensible.

This is worth considering as a strategic asset rather than a compliance cost, because it compounds in ways that technical capability alone does not.

What I Think This Points To

The shift from copilot to operating system is not a metaphor. It is a description of a real change in what AI infrastructure can do today, and what enterprises are now being asked to do in response.

The teams that pull ahead will not necessarily be the ones with the most AI capability. They will be the most rigorous about workflow redesign, the most disciplined about defining agent contracts before building, the most systematic about scaling what works, and the most thoughtful about the governance that earns trust over time.

None of that is easy. But it is at least reasonably clear. And clarity is worth something when most of the noise in the market is still operating at the level of ‘AI is going to change everything.’

It is. The more useful question is: what are you changing first?

← The Moats That Matter in Defence AI The Tactician and the Technician →

Written by Seb Matthews

Author, speaker, and advisor on leadership under pressure and organisational performance.

Take the AI to the Fight

by Seb Matthews | Mar 31, 2026 | AI, Edge, Military

Cloud AI made a bet that data would flow to the model. In military operating environments that bet fails, for two distinct reasons. This post explains why, and closes a three-part series on the design principle that runs through every layer of defence AI.

Fidelity Over Distance

by Seb Matthews | Mar 29, 2026 | AI, Government, Military

The problem in defence software delivery was never really about distance. It was always about how much of the real problem survives the journey from operator to engineer. LLMs change what fidelity is achievable when physical proximity is not possible.

The Tactician and the Technician

by Seb Matthews | Mar 26, 2026 | AI, Government, Military

Defence software delivery consistently fails because of the structural distance between the engineer who builds and the operator who uses. The case for forward deployment, why it works, and why the commercial model is the hardest barrier to fix.