Every few years, an entirely new primitive arrives that forces operators to rethink how work gets done. The spreadsheet did it. The relational database did it. The web browser did it. Each one looked, on arrival, like a curiosity for hobbyists and a threat to whatever workflow it eventually displaced. Each one ended up redrawing the org chart of the average company. The pattern is so consistent that, once you have seen a few of these transitions, the temptation is to assume you know what to do with the next one. The temptation is usually wrong. What stays constant across these transitions is not the playbook for adopting the new primitive — it is the fact that the operators who get the next one right are the ones who refuse to assume they already know how to think about it.
Large language models are the latest primitive, and they have already triggered the usual cycle: breathless optimism, predictable backlash, and a quieter, more interesting middle phase in which serious operators figure out where the new tool actually belongs. We sit on the boards and in the cap tables of small technology companies working through that middle phase right now. What follows is what we are seeing, what we believe, and where we think the puck is going. The thesis, briefly stated: the right mental model for LLMs is not “intelligent software” but “probabilistic infrastructure,” and the companies that internalize that distinction will build dramatically more durable systems than the ones that do not.
The Old Contract: Determinism as a Feature
For the last fifty years, enterprise software has been built on a single, almost moral premise: given the same input, produce the same output, every time. This is determinism, and it is not a stylistic choice. It is the load-bearing assumption underneath payroll, accounting, settlement, claims processing, identity, access control, and the long tail of internal tooling that makes a company function. When a deterministic system gives a different answer on Tuesday than it gave on Monday, that is not creativity. That is a bug, and in regulated industries it is often a fineable one.
Deterministic systems are testable. They are debuggable. They produce audit trails that satisfy auditors, regulators, and the occasional plaintiff’s attorney. They are also, by design, brittle. Tell a rule engine something it has not seen before and it either fails loudly or, worse, fails quietly. For decades, the industry has papered over that brittleness with two expensive ingredients: humans and consultants. Both are now in shorter supply than the work requires.
The deeper observation is that the deterministic regime made an implicit bet about the world: that the inputs to enterprise systems could be cleaned up before they arrived. Forms would be filled out correctly. PDFs would conform to templates. Customers would describe their problems in approved categories. The bet was never fully true, but it was true enough for a long time that the industry got away with it. The cleanup work was hidden inside the human layer — clerks, support agents, paralegals, billing operators — who normalized messy reality into the structured shapes the deterministic systems could consume. The work was real, but it was outside the software, so it did not show up in the architecture. LLMs are the first technology that can credibly absorb a large fraction of that hidden cleanup work, which is why their impact will be larger than the surface metrics suggest.
The New Primitive: Probabilistic by Design
LLMs invert the old contract. They are probabilistic, sampling from distributions over possible outputs. Ask the same question twice and you may get two reasonable, non-identical answers. For an operator coming from the deterministic world, this feels like a regression. It is not. It is the price of a capability that traditional software has never had: the ability to read messy, unstructured, ambiguous input and produce a useful response without anyone having to enumerate the rules in advance.
The strategic question for any operator is therefore not whether to adopt LLMs. That question is already settled by the economics. The strategic question is where, inside the business, the new primitive belongs, and just as importantly, where it does not.
The cultural transition is harder than the technical one. Engineers who have spent decades treating nondeterminism as the enemy now have to learn to design with it, around it, and on top of it. The right mental model is not “LLMs make software smarter.” The right mental model is “LLMs are a new kind of dependency, with different failure modes than the dependencies engineers are used to managing.” This sounds modest. It is the entire game. Companies that treat LLMs as smart software try to push them deeper into the core, where the consequences of variance are highest, and discover the hard way that probabilistic systems do not belong there. Companies that treat LLMs as a probabilistic dependency keep them at the edges, wrap them in deterministic scaffolding, and discover that the same model can do an enormous amount of useful work without ever being asked to make decisions it cannot reliably make.
Where Determinism Still Wins
Determinism still wins anywhere correctness is binary and the cost of variance is high. Moving money. Writing to a system of record. Granting or revoking access. Calculating tax. Signing a contract. Executing a known sequence of API calls. These are the places where the answer is either right or wrong, where there is no graceful degradation, and where the auditor will eventually come asking. We tell our portfolio companies, bluntly, that an LLM has no business making any of these decisions on its own. The deterministic layer is not legacy. It is the spine.
The reason the deterministic spine has to remain deterministic is not just regulatory. It is epistemic. A regulator, a customer, or an internal investigator must be able to look at a system and answer the question, “why did this happen?” Deterministic systems answer that question by replaying the inputs and showing the same outputs. Probabilistic systems cannot answer it the same way, because the same inputs do not always produce the same outputs. You can sometimes recover a plausible explanation by examining the prompt, the context, and the seed — but “plausible” is not “deterministic,” and any business that has to defend its decisions in front of an auditor will discover that the difference matters.
Where LLMs Earn Their Keep
LLMs earn their keep at the edges of the business, in the places where structured systems have always struggled. Reading a customer email and figuring out what the customer actually wants. Pulling line items out of a PDF invoice that arrived in a format no one has seen before. Triaging a support ticket. Drafting a first pass of a contract, a memo, a follow-up. Classifying a transaction. Summarizing a meeting. Translating a stakeholder’s loose request into a structured query against a database that already exists.
None of this is glamorous. All of it is expensive when done by humans, and all of it has historically been the work that breaks every rule-based system the moment reality drifts. This is the work LLMs were built for, and it is the work where the ROI in our portfolio has been most consistent and most defensible.
The pattern of where LLMs work and where they do not is, ultimately, a pattern about the structure of the underlying task. Tasks with high input variance and tolerant-of-variance outputs — interpretation, summarization, classification, first-draft generation — are where LLMs shine. Tasks with low input variance and intolerant-of-variance outputs — money movement, identity, compliance — are where LLMs fail. The mistake operators make is treating the LLM as a general-purpose tool that can be pointed at anything. It is not. It is a specialized tool with a particular shape, and the operators who match the tool to tasks of the right shape are the ones who get returns from it. The operators who match the tool to tasks of the wrong shape are the ones writing the cautionary press releases.
The Hybrid Pattern
The systems that are working in production are neither purely deterministic nor purely LLM-driven. They are hybrids, and the pattern is converging across our companies and across the industry.
An LLM sits at the edge, where the input is messy. It interprets, extracts, classifies, and proposes. Deterministic code sits at the core, where the action is consequential. It validates, executes, and logs. Between the two lives a contract: structured outputs, schema validation, allow-listed tool calls, retries, and human-in-the-loop review for the cases that fall outside the contract. The LLM proposes. The deterministic layer disposes. The audit trail survives.
This is not a theoretical architecture. It is, increasingly, the default. The companies that have skipped this discipline and pointed an LLM directly at production systems have been the ones generating the cautionary tales that fill the trade press.
The hybrid pattern is also where the architectural craft has migrated to. A decade ago, the interesting design decisions in enterprise software were about data models, API contracts, and consistency guarantees. Today, the most interesting decisions are about where the seam between LLM and deterministic code sits, how the contract between them is enforced, and what happens when the LLM produces something the deterministic layer cannot accept. These are real engineering decisions with real consequences, and the engineers who get them right are the ones whose systems will run quietly in production for years. The engineers who treat the seam as an afterthought are the ones whose systems will be torn out within a year of deployment, replaced by something better-designed by a competitor.
Taming the Nondeterminism
You can narrow an LLM’s variance, though you cannot eliminate it. Temperature settings, constrained decoding, JSON schemas, function calling, evals, and automated retries all push the distribution of outputs toward something tight enough to ship. The teams that are winning treat their LLM layer the way good engineering teams have always treated flaky dependencies: assume it will misbehave, design the surrounding system to catch it, and measure relentlessly. Evals are the new unit tests, and the companies that take them seriously are the ones whose systems do not embarrass them in front of customers.
The discipline of evals deserves special emphasis, because it is the discipline that most determines whether an LLM-powered system can be operated at scale. An eval is a test that measures, against a defined set of inputs, whether the LLM’s outputs meet a defined standard. The companies that run hundreds of evals per release catch regressions before they reach customers. The companies that run none discover regressions after customers notice. The cost of building evals is real and the cost of not building them is larger, but the second cost is invisible until it suddenly is not. We push our portfolio companies, hard, to invest in evals well before they feel necessary, because the alternative is to operate the system blind for as long as the team will tolerate, which is always longer than is wise.
Build, Buy, or Wrap
A recurring question in our boardrooms is what to build, what to buy, and what to wrap. Our working answer: do not build foundation models, do not buy thin wrappers, and wrap aggressively wherever the underlying model is a commodity. The durable value at the small-company scale is almost never in the model itself. It is in the proprietary data, the workflow integration, the deterministic guardrails, and the trust the company has earned with its customers. Those are the assets that compound. The model underneath will be replaced, probably more than once, before the decade is out.
The most reliable way for a small company to lose money in this market is to fall in love with a particular model provider and design the product around the provider’s specific capabilities. The model market is moving too fast, and the provider that is on top this quarter will not necessarily be on top next quarter. The companies that abstract the model behind their own interface — that can swap providers as the price-performance curve moves — preserve optionality that the companies that hardwire a particular API quietly lose. Optionality is not a feature anyone buys directly, but it is a feature that, over five years, separates the companies that compound from the companies that have to keep rebuilding.
What We Are Watching
Three things are on our radar going into the next stretch. First, the slow professionalization of LLM operations: evals, observability, prompt versioning, and the rest of the unglamorous plumbing that turns a demo into a system. Second, the migration of agentic patterns out of the lab and into real workflows, which will force a much more serious conversation about permissions, identity, and liability than the industry has had so far. Third, the quiet consolidation of the deterministic layer itself, as workflow engines, orchestrators, and policy systems learn to host LLM steps as first-class citizens rather than bolt-ons.
Of these three, agentic patterns are the one most likely to be misjudged, in both directions. The optimists will claim that agents are the future of all enterprise work, and they will be partially right and mostly early. The pessimists will claim that agents are a parlor trick that does not scale, and they will be partially right about the current generation and entirely wrong about the trajectory. The truth is that agents will work, but only in narrow, well-instrumented contexts, with clear permissions, careful identity controls, and explicit human accountability. The companies that build that scaffolding before they ship agents will be fine. The companies that ship agents without it will produce the next round of cautionary tales.
The Real Question
The interesting question, then, is not whether deterministic or nondeterministic systems are better. That framing belongs to a debate that has already been settled by the operators actually shipping software. The interesting question is where, inside your specific business, the line between the two should sit, and who in your organization has the authority and the technical judgment to draw it.
Get that line right, and you get a business that is both reliable and adaptive: the audit trail your regulators expect, paired with the flexibility your customers have quietly started to demand. Get it wrong in either direction, and you end up with a system that is either too brittle to serve the market or too loose to be trusted with it. The companies that figure this out early will spend the next decade compounding the advantage. The ones that do not will spend it explaining themselves.
What to Do Monday Morning
Map your existing system into two columns. Determinism on one side; tolerated nondeterminism on the other. Be honest about which side each step actually belongs on, not which side it currently sits on. Most companies, when they do this exercise for the first time, discover that they have placed several deterministic-by-nature operations inside an LLM call, and several LLM-appropriate operations inside rigid rule engines. The map is the diagnosis. The work that follows is the cure.
Build the contract between the two layers with the same rigor you would use for an API contract with an external vendor. Schema validation, allow-listed tool calls, structured outputs, retries, fallback paths, human-in-the-loop review for edge cases. Treat the LLM as if it were a contractor whose work product had to be checked before it could be accepted into the system. Because that is what it is.
And finally, invest in evals before you feel ready. The companies that ship LLM-powered systems without evals are not saving time — they are deferring the cost of measurement, and the cost compounds. Evals are not a sign of maturity; they are a prerequisite for it. Build them early, run them often, and treat the eval suite as a first-class artifact of the codebase. The companies that do this will look, in five years, dramatically more competent than the companies that did not, and the gap will be visible in their products, their reliability, and their margins.
