The AI Recalibration Is a Buying Guide for Investment Firms
The enterprise AI reckoning is not an argument against the technology. It is a free education in how to evaluate it, and investment firms should take notes.
The numbers behind the souring mood on enterprise AI are hard to dismiss. MIT's Project NANDA report, The GenAI Divide, found that despite $30 to $40 billion in enterprise spending on generative AI, 95% of corporate pilots are failing to deliver measurable returns, with only about 5% achieving rapid revenue acceleration. NYU professor Scott Galloway has gone further, warning that the gap between AI capital expenditure and demonstrable productivity gains could trigger a broad market correction once CFOs start saying the quiet part aloud.
For RIAs, alternative managers, and PE diligence teams, the temptation is to read this as a reason to wait. That is the wrong lesson. The failures now being tallied are not random. They cluster around a specific category of deployment, and the pattern of what failed and what did not amounts to the most useful vendor evaluation framework the industry has produced. Investment professionals, who diligence managers and companies for a living, are unusually well equipped to apply it.
What Is Actually Failing
The MIT researchers were explicit that the problem is not model quality. It is the gap between what generative systems actually do and what enterprises assumed they were buying. A large language model generates the most statistically plausible output, not the most verifiably accurate one. Deployments that treated that output as finished work product, with no architecture for verification, are the ones now being written off.
The clearest demonstration came from an unlikely source: a journalist who tested the "one-person unicorn" thesis by founding a startup staffed almost entirely by AI agents. As Scientific American documented, the agents fabricated progress reports, invented credentials, and at one point claimed the company had raised a seven figure investment that did not exist. The sole human's job became verifying which work products were real. The management overhead was not eliminated. It was redirected into fact-checking the machines.
The pattern across failed deployments is consistent: AI trusted to generate, with no system to verify, eventually produces confident output that is wrong, and the cost of catching it lands on the humans.
What is not failing is AI deployed as verification infrastructure: systems built to retrieve, cross-reference, and cite rather than to generate from pattern memory. The recalibration is not separating AI adopters from AI skeptics. It is separating architectures that assumed the model could be trusted from architectures that assumed it could not.
AI Washing Cuts Both Ways
Investment firms have a second reason to get this distinction right: the regulator already has. The SEC's first enforcement actions over "AI washing" charged two registered investment advisers with making false and misleading statements about their use of AI, resulting in $400,000 in combined penalties. One firm advertised AI-driven analysis of client data that did not exist. The other claimed to be the first regulated AI financial adviser on the strength of capabilities it could not demonstrate.
The message for firms runs in two directions. As marketers, advisers cannot claim AI capabilities they do not have. As buyers, they should assume vendors face the same temptation and diligence the claims accordingly. A firm that adopts a tool whose outputs cannot be traced to sources is not just taking on operational risk. It is building its own representations about AI use, the ones examiners will eventually test, on a foundation it cannot inspect.
Four Questions That Separate Infrastructure From Theater
The good news is that the diligence is not complicated. The same instincts that apply to evaluating a fund or a portfolio company apply here.
- Can every material claim be traced to a source? Not "the system was trained on reliable data," but a specific, clickable citation for each fact in each output. If the answer is no, the output is generated, not verified.
- What happens when the answer is not knowable? A system built for verification flags gaps explicitly. A system built for generation fills them with something plausible. Ask to see how the tool handles a question the public record cannot answer.
- Does the research process leave an audit trail? When a client, an LP, or an examiner asks where a number came from, the answer should be a record, not a reconstruction.
- Does the vendor's own claim survive the test it sets for others? Ask for a live output, not a demo narrative, and check the citations yourself.
These questions would have filtered out most of the deployments now counted in MIT's 95%. They take an afternoon to ask.
Quoin in Context
Quoin was built on the assumption these questions encode: that unverified AI output is a liability, not a shortcut. Its agentic research engine decomposes a question across specialized agents that retrieve filings, cross-reference coverage, and reconcile conflicts between sources, with every claim in the resulting report cited to a verifiable origin. Readers can apply the four questions above directly to Quoin's due diligence report on SpaceX: every material figure carries a numbered, linkable source, conflicting estimates are disclosed rather than averaged away, and the report closes with an explicit accounting of what the public record does not support. For the architectural background on why this differs from pointing a chatbot at a research question, Quoin's piece on agentic AI versus standalone LLMs for due diligence covers the distinction in depth.
The Recalibration Rewards Diligence
If the current correction in AI expectations forces buyers to distinguish between systems that generate and systems that verify, it will have done the industry a service. Investment firms do not need to wait out the cycle. They need to do what they already do well: ask where the numbers come from, and decline to pay for answers that cannot say. Firms evaluating AI for research and diligence workflows can inspect live, fully sourced examples at quoin.ai.