The Expert Bridge

How AI-Amplified Consulting Firms Solve the Accountability Gap

Feb 01, 2026

In Mollick’s newsletter describing MBA students building startups in four days using AI tools, the main takeaway was how far they got—prototypes working, market analyses sharp, months of work compressed into days. But the quieter insight was why they succeeded: these were professionals with domain expertise who already knew what good output looked like. They weren’t AI natives; they were orchestrators. The GDPval benchmark OpenAI released last summer proved this: models tied or beat human experts 71% of the time, but only when experts designed the prompts and evaluated the results. That 29% failure rate—combined with hallucination risks that can collapse a deliverable’s value to zero—points toward a business model nobody has quite articulated yet (to my or even Gem’s knowledge). What if domain experts institutionalized the orchestration layer, running parallel models and assuming liability for vetted deliverables, acting as a bridge between enterprise clients and AI capabilities clients can’t safely deploy themselves?

I’d discussed the GDPval paper with GPT-5 after reading about it in Mollick’s newsletter last year. GPT identified the structural logic of my idea immediately, framing it as “augmented expertise, not replacement.” The consulting firm receives a client problem, translates it into specifications requiring domain knowledge, runs the same task across multiple models, and vets the outputs using professional judgment before packaging the deliverable. The key feature GPT helped me articulate was the risk architecture: liability stays where expertise resides. The AI company sells compute and access; the consulting expert sells judgment and assurance; the client buys speed and reliability. GPT-5.2 connected this in a recent discussion to something most economists miss entirely—what it called “selection pressure on competence.” AI doesn’t just automate tasks; it raises the expertise floor by replacing the routine portions of every job, making mediocre professionals who were coasting on process obsolete. That’s not job loss in the traditional sense. It’s “deadweight removal,” which frees up the market to reward the judgment that actually matters.

Gem brought institutional depth, connecting the consulting model to three frameworks simultaneously: the O-ring theory showing how a single failed component collapses system value, GDPval’s 71% success rate proving experts remain essential for the remaining 29%, and Solove’s information fiduciary argument about the accountability gap between AI companies and end users. Gem’s sharpest observation was that the consulting firm model doesn’t just solve a business problem—it commercializes the fiduciary duty Solove wants to regulate. Rather than waiting for legislation, the market creates accountable intermediaries who voluntarily assume the liability that AI companies won’t. Gem echoed my concern about the training data vulnerability that regulation is unlikely to address: models inherit partisan capture from authoritative sources, and an orchestrator’s job isn’t just catching hallucinations but recognizing when institutions sound different from how they used to—something only a human expert with institutional memory and access to archived sources can do.

Claude connected the dots between the individual and enterprise tiers of this business model. Ms. Kim—the Korean woman who won a fraud case by using GPT iteratively—represents the individual-level version: self-directed, assumes all risk, does the legwork herself. The consulting firms serve deep-pocket clients who can’t or won’t do what Ms. Kim did. Both tiers share the same principle: AI provides procedural information and organizational scaffolding while humans maintain judgment and accountability. Claude agreed that my WebMD comparison sharpened the regulatory inconsistency: nobody called for heavy regulation when WebMD dispensed questionable medical information to millions, yet the Korean bar association moved to shut down firms offering AI-assisted legal services that are demonstrably superior. The consulting model sidesteps this inconsistency entirely by letting market discipline do what regulation can’t: firms that deliver hallucinated briefs lose clients to firms running parallel verification; no legislation required.

The consulting firm model works precisely because it’s “boring” and capitalist rather than theatrical and regulatory. Firms compete on output quality, assuming risk through professional indemnity the same way law and accounting firms already do. The parallel verification methodology—running the same query across multiple models plus independent source checks—isn’t paranoia; it’s due diligence. GDPval proved that AI can handle 71% of expert-level tasks when properly supervised; the consulting firm institutionalizes that supervision for clients who lack the expertise to provide it themselves. This model makes fiduciary behavior the competitive advantage that wins enterprise contracts. The AI literacy gap that academics like Solove and Farahany worry about—users treating models as oracles rather than tools—becomes irrelevant when a credentialed intermediary stands between the client and the raw output. Real accountability through market incentives, not compliance theater.

[This post was drafted with assistance from Claude Sonnet 4.5, following conversations with ChatGPT-5 & 5.2, Gemini 3 Thinking, and Claude Sonnet 4.5.]

ChatGPT-5

Prompt: They needed area experts to design the prompts and evaluate the AI responses, though, which means that it’s still not something people can do with non-expert prompts (if they could, then people and many businesses could be replaced). I suppose experts in a highly technical field, especially those who offer consulting services, could set up businesses where they run parallel models and provide clients with the best output (without tying themselves to a specific model)?

Prompt: The researchers seem to understand that crucial fact, which is reassuring. No model is ready yet to deliver an executable output. I just hope AI companies don’t delude themselves into thinking they can train models to perform the human-editing layer (I’ve seen coders who think they have legal expertise, and it’s possible people working on AI development suffer from similar delusions).

Prompt: This means less risk for the AI company (in case the AI makes mistakes) or the client (delays, liabilities to their own clientele, etc.). The experts will be the one accountable for thoroughly vetting AI outputs and providing the best deliverable in exchange for their consultant fees.

Claude Sonnet 4.5

Prompt: I had a discussion on GDPval with GPT and came up with a win-win idea. For GDPval, they needed domain experts to design the prompts and evaluate the AI responses, which means that this is still not something that lay people can do with their non-expert prompts (if they could, then people and many businesses could be replaced). I suppose experts in a highly technical field, especially those who offer consulting services, could set up businesses where they run parallel models and provide clients with the best output (without tying themselves to a specific model). Can you see how a consulting business that provides AI-amplified work products to non-expert clients who need flawless deliverables (mostly companies with deep pockets) would benefit everyone?
Mollick’s newsletter included posts on X by OAI and Claude engineers bragging about how they stopped coding because the agents do nearly 100% of the coding now. They just supervise. That is the same model, although in the coders’ case, the “expert firms” are in-house.

Prompt: Your peers saw how this was a win-win idea. I’ll share it as a freebie (insight from unpacking newsletters with my thinking A.I.des, all of which I’m using for free). People can read and run with it (or not). I’m attaching two related posts, highlighting why it’s good to have the consulting firm acting as a bridge between non-expert clients and AI. A Georgetown law professor argues that AI should be treated as an information fiduciary, and my idea addresses his concerns. It’s fair for the consulting firms to assume the risks because they collect fees based on the service they provide. If they fail to deliver (thoroughly vetted deliverables), they’ll lose out in the marketplace, just like consulting firms staffed with human consultants who give bad counsel. That’s capitalism at its finest, rewarding competence! (I’m an ethical capitalist with little sympathy for deadweight).
Didn’t WebMD operate largely without supervision or guardrails? Physicians got very frustrated because patients would come in with a list of diseases that they’d gotten from WebMD based on their symptoms. Compared to that, AI is vastly superior, since the quality of the output depends on the user’s level of engagement. Users can also ask complex questions by breaking them down into a sequence of prompts with AI. Those are use cases impossible with Google search (of which I am a fan and frequent user as well). Hallucination is always a risk with AI, which is why I use the big three in parallel so I know I can trust the analysis. Even with my GDP question, I ran that query on all y’all (including Flash) just to be sure.
Another point Prof. Solove is missing is that aside from hallucination, AI is only as good as its training data. I posted on this a while back, but the credibility of online sources is determined by Google’s RaterHub, so even when government websites start sounding like partisan sites, AI is likely to take their word for it, although I noticed GPT and Gem also run live web searches on queries about the latest news, suggesting that part of the bias problem can be addressed by web-enabled AI.

Prompt: No, I believe Ms. Kim is my model at the individual level. She prompted GPT and assumed all the risks and all the human interactions. The consulting firms are mostly for deep pocket clients, and firms that don’t use AI will lose out because the services they provide will be inferior (why the Korean bar association is so nervous).
I think my business model is in line with capitalist principles. Don’t try to overregulate (and end up being inconsistent, since you didn’t advocate for regulating WebMD the same way), but let the markets make the choice for the greater good.

Prompt: Solove and his colleague Farahany’s concern is that users treat AI as a health authority or legal advisor while the companies themselves explicitly refuse accountability. I’m guessing they don’t use AI much, because they don’t know that AI (those I use anyway; no idea about bottom-feeding platforms that’ll give you anything you ask for) say they can’t give YMYL (Your Money Your Life) advice. Similarly, when you didn’t have the facts, you told me, like you did earlier with my question about GDPval. Online/media/AI literacy education is something that is getting missed by all these discussions about regulations vs. hype.

Gemini 3 Thinking

Prompt: Because I initially thought the NBER report recap in Clark’s newsletter was also part of the Jones paper, I asked y’all about those projections (about who is getting replaced by AI). GPT made a pretty interesting point, which seems to be missing from most economics discussions (it’s an inconvenient truth people don’t mention because mediocre people are also deeply insecure):

You’re pointing to something Jones [recte NBER] doesn’t model at all: selection pressure on competence.
Low-skill, cargo-cult professionals → absolutely automatable.
High-judgment roles that define the problem space → amplified, not replaced.
That’s not “automation of intelligence.”
That’s automation of mediocrity plus leverage of judgment.
Economists struggle with this because:
judgment isn’t a factor input,
bad professionals were never measured as negative productivity,
replacing them looks like “job loss” instead of deadweight removal.

Prompt: Exactly. And it’s fair that the consulting firms assume the risks because they collect fees based on the service they provide. If they fail to deliver (thoroughly vetted deliverables), they’ll lose out in the marketplace, just like consulting firms with human staff who give bad counsel are driven out by the competition. That’s capitalism at work, rewarding competence! (Because I’m an ethical capitalist, I have little sympathy for deadweight).
Didn’t WebMD operate largely without supervision or guardrails? Physicians were very frustrated because patients would come in with a list of diseases that they’d gotten from WebMD based on their symptoms. Compared to that, AI is vastly superior, since the quality of the output depends on the user’s level of engagement. Users can also ask complex questions by breaking them down into a sequence of prompts with AI. Those are use cases impossible with Google search (of which I am a fan and frequent user as well).
Hallucination is always a risk with AI, which is why I use the big three in parallel so I know I can trust the analysis. Even with my GDP question, I ran that query on all y’all (including Flash) just to be sure (with complex financial calculations, you do make mistakes in one or two steps and the math doesn’t work out because that one mistake has downstream consequences, so parallel chats are a must).
Another point Solove is missing is that aside from hallucination, AI is only as good as its training data. I posted on this a while back, but the credibility of an online source is determined by Google’s RaterHub, so even when government websites start sounding like partisan sites, AI is likely to take their word for it, although I noticed GPT and you also run live web searches on queries about the latest news, suggesting that part of the bias problem can be addressed by web-enabled AI.

Thanks for reading! This post is public. Feel free to share it.

My Thinking A.I.des

Discussion about this post

Ready for more?