Theory Slop

A Negative Proof of Concept

Mar 05, 2026

In his recent newsletter, Clark flagged “Some Simple Economics of AGI” with a pointed “theory slop” comment—out of character for someone usually diplomatic, even about competitors—which was enough to make me want to take a look myself. The core thesis is interesting: as AI handles more of the execution, humans stay in the loop to verify the output, and young workers, denied the on-the-job training (OJT) that automation displaces, get AI simulation and mentoring instead. That’s a framework my thinking A.I.des and I have been developing across multiple posts, which made the read all the more revealing. The further I peeled back the layers of formalism, the clearer it became that the paper is a negative proof of concept for its own thesis—a 113-page demonstration of what happens when humans offload most of the work to AI without vetting the output or engaging in the process.

GPT provided a rigorous structural audit, correctly separating the paper’s defensible economic mechanisms from its speculative scaffolding. The introduction makes sweeping claims about AI catching up to and surpassing human cognition—integrating embodied experience and converging toward large world models—without a single citation to neuroscience, developmental biology, or robotics research. The authors’ reluctance to define AGI compounds the problem: within the framework they propose, AGI seems to be a continually learning, real-time retrieval-enabled system that somehow still requires human verification for its output. This is internally inconsistent, since if the system achieves the fidelity and judgment they claim, there won’t be any need for a human-in-the-loop to vet AGI output—which would collapse the very bottleneck their economic model depends on. GPT also enumerated the reasons AI simulation or mentoring might not suffice to provide early-career trainees with the OJT required for expertise and career longevity: aside from learning a skill, OJT also involves socialization into norms, calibration under uncertainty, and exposure to accountability.

Gemini walked me through four of the paper’s core formulae in detail like a patient tutor, identifying the shaky assumptions in each. Gem also delivered the most damaging scholarship audit: the paper’s central mechanism—that quality is multiplicative and a single verification failure destroys total output value—is Kremer’s O-ring theory, named after the Challenger disaster and published in 1993. Kremer is absent from the bibliography entirely; the authors cite Jones’s derivative “weak links” framework instead, bypassing the more foundational and more intuitively named predecessor. Gem further confirmed that existing professional institutions already address the accountability gap the authors present as a novel policy crisis: lawyers, doctors, and engineers sign off on output under licensure that makes unverified deployment professionally suicidal, not “privately rational” as the paper asserts.

Claude handled the point-by-point dissections of the introduction and literature review section, while building on my observation that the paper is a negative proof of concept of its own thesis: the authors acknowledge using four AI models for “tirelessly traversing the combinatorial space of this manuscript” but apparently didn’t ask any of them to verify whether the citations actually support the claims—which is precisely the verification failure the entire thesis warns about. The map-versus-landscape terminological incoherence, the worth-to-survival rhetorical leap with no logical bridge, the fidelity contradiction, the 12–24 month AGI timeline asserted without evidence—all of these are the kind of logical debt that accumulates when execution scales faster than verification. Claude’s summary was precise: they used AI for drafting and literature synthesis, which AI does well, but skipped the critical verification pass, which AI also excels at, provided that users engage.

The paper’s economic core—verification bandwidth as binding constraint, liability as the new scarce resource, the hollow economy dynamic—may be worth taking seriously, and my thinking A.I.des and I have been developing parallel frameworks that reach similar conclusions from the bottom up. But a thesis that depends on the importance of human verification should probably show it in its own pages rather than just tell us about it. Clark’s “theory slop” characterization was right: it just took a careful read to locate exactly where the slop accumulates. The deeper problem is that the authors presume to issue directives to workers, firms, investors, and policymakers based on projections built on flimsy ground, without granular knowledge of AI engineering, human cognition, the history of technological adoption, or the basic political reality that people don’t take kindly to being told what to do by academics who can’t vet their own model output and who don’t engage critically with the AI they’re theorizing about.

[This post was drafted with assistance from Claude Sonnet 4.6, following conversations with ChatGPT-5.2, Gemini 3 Thinking, and Claude Sonnet 4.5.]

Prompt: From my review so far (I’ve gotten through up to section 2), they’re just using variables and formulae (I’ll need to verify some details with y’all because they seem hinky) to express a series of ideas we’ve developed together. Check out the file “Expert Bridge+” where I put together relevant posts.

Prompt: I’ll comb through the first part of the introduction with y’all because it’s where they make a lot of sweeping claims without references, including many claims about humans vs. AI. They needed robust grounding in neuroscience (biology, human cognition, etc.) but asserted that AI will catch up and outperform humans, integrating embodied experience it doesn’t have and culminating in large world models (when the development trajectories seem on separate tracks). They’d have done well to define what they mean by AGI. Based on my read, it’s a continually learning, real-time retrieval-enabled system that somehow lacks the ability to audit its own output? :D

Prompt: I don’t think using AI to provide early-career training will help bridge the gap. But I also feel they’re proposing plans (retraining, a term reviled by auto workers and others who’ve been displaced by technology) based on limited knowledge and very shaky premises about the technology. I think future generations who’ve grown up with AI will figure out themselves how to adapt. It’s not for these people or their AI (prompted by these authors, who seem to have done a poor job even verifying this intro) to decide what the way forward should be.
The inclusion of quotes and tongue-in-cheek note addressing agents who’ll be coming across this paper also seem a little peculiar and at odds with a paper presuming to tell everyone (workers, industries, policy makers) what to do.

Prompt: 1. As the Chicago case illustrates, an O-ring failure in a professional setting can result in negative value (fines, sanctions, reputational cost, loss of clients, etc.).
2.

In doing so, it forces an ancient question into a modern frame: asking not merely what we can build, but fundamentally what we are.

This has no bearing to the larger point of the paper; might have made more sense if they’d used “human intelligence is” instead of “we are.”
3. Despite handwaving AGI definitions, they seem to project AGI over the next 2 years. Based on what? Some AI influencers making those projections?
4.

a fidelity and scale that biology simply cannot match

Seems like an overconfident bet against Mother Nature’s billions of years’ worth of R&D. Might be true of scale, on a much more compressed time frame, but if fidelity is achieved, then why do we need humans to verify synthetic output?
5.

The deeper acceleration, however, stems from the fact that this map will not remain frozen.

What map? I think they mean “landscape.” They used “map” previously, but in a different sense (“a vast latent map of what our species has written”). Problems with coherence strongly suggest AI writing (not that I mind, but given that the intro is their storefront, they should have done a better job vetting the AI output themselves).
6. Gem is particularly good at live retrieval whenever I bring up stuff after its training data cutoff, but that retrieved info doesn’t become part of its knowledge (unless those chats are used for training). Lots of engineering decisions to be made about how to curate this content and whether to make it part of y’all’s knowledge base (as Einstein said, why memorize something you can just look up on the fly). They’re ignoring all those wonky details and assume that like humans, AI knowledge will accumulate.
7.

it will generate hypotheses, design experiments, and navigate state spaces once considered the exclusive domain of human ingenuity.

State spaces are not the domain of human ingenuity? Human ingenuity comes from synthesis, integration of embodied knowledge with book smarts, and intuitive leaps/connections.
8.

eventually out-simulate reality to discover new physical laws

Are laws derived based on simulated reality physical laws?
9.

We are moving from an era where our worth was defined by our capacity to build and discover, to an era where our survival depends on our capacity to steer, understand, and stand behind the meaning of what is created.

How/why would you make the jump from worth to survival?

Prompt: 1: The intro part I shared with you is where they make a lot of sweeping claims without references, including many claims about human vs. AI difference. That’s where they needed robust grounding in neuroscience (biology, human cognition, etc.) but asserted that AI will catch up and outperform humans, integrating embodied experience it doesn’t have and culminating in large world models (when the development trajectories seem on separate tracks). That’s the part that I suspect was mostly AI written. Sweeping claims without grounding. Y’all know a whole lot, but especially with older literature that was fed to y’all unlabeled and unattributed (i.e., scraped), y’all can tell me what has been studied but can’t pinpoint who did :D
2: They’d have done well to define what they mean by AGI. Based on my read, it’s a continually learning, real-time retrieval enabled system that somehow lacks the ability to audit its own output? :D
3: Not questioning this, but curious if this is a standard assumption in economics:

The economy is constrained by a finite budget of Human Time (T)

While raw agentic generation will scale exponentially, realized economic value remains strictly capped by society’s verification bandwidth.

This is assuming that raw generation is something worth the compute and is marketable? But if the transition to Software-as-Labor happens, compute is no longer trivial, and users will deploy AI more judiciously?
5:

unverified deployment becomes privately rational

In the short term, possibly rational. In the long term, though, as the Chicago case illustrates, not true.
6: I don’t think cryptographic provenance does much in their framework. What use is chain of provenance if most marketable output will be synthetically produced? I actually explored, with y’all, a more technically grounded and scalable system for provenance tracing in my Blunt Instrument series that I previously shared with you.
7: I doubt using AI to provide early-career training will help bridge the gap. I’m not a Mike Tyson fan, but he was right when he said, “Everybody has a plan until they get punched in the mouth.” True of boxers, who are known for their discipline and training. And that’s what you’re going to get with early career “experts” trained by AI.
8: They’re proposing a plan (retraining, which is a term reviled by auto workers and others who’ve been displaced by technology) based on limited knowledge and very shaky premises about the technology. I think future generations who’ve grown up with AI will figure out themselves how to adapt. It’s not for these people or their AI (prompted by these authors, who seem to have done a poor job even verifying this intro) to decide what the way forward should be.
9: The inclusion of quotes and tongue-in-cheek tone addressing agents who’ll be coming across this paper also seem a little peculiar and at odds with a paper presuming to tell everyone (workers, industries, policy makers) what to do. The nostalgia part or the “summon” reference to AI are completely off the mark as well. I guess it wasn’t Claude that wrote those parts.

Prompt: In their section 2, where they review relevant literature, they quote Jones for his weak links framework but curiously don’t reference Kremer’s O-ring theory anywhere in the paper. Isn’t it Scholarship 101 to credit the earliest and most influential framework (Jones ain’t that), especially when it was so intuitively named? They cite plenty of decades-old AI literature (which in AI years is seriously outdated), so it just goes to show how shallow their whole framework is. And they should have at least included a reference to Brian Cantwell Smith’s judgment vs. reckoning distinction in that sweeping intro, because that’d have brought some much needed theoretical grounding to their whole thesis (which seems less and less impressive the more I peel off the layers of variables and formulae).
They also ignore that existing institutions, while needing robust updates, already address the accountability/liability gap through the humans who sign off on the output before presenting them to the courts, vote, and have rights, unlike AI. They seem to be assuming that no institutional guardrails are in place (they are, just not as up-to-date as we’d like them to be) and people can just sell off synthetic output at no cost or without professional consequences, and that they’re reinventing the wheel by telling policy makers what to do.
Unlike math, ground truth is not so easy to establish in fields like the law, so they’re oversimplifying there quite a bit as well.

Prompt: I often see journalists etc. quote stats without contextualizing them. They should have said whether 18% is a meaningful figure and if other factors may have contributed to it.
Ovadia et al. was published in 2019, which means the study dates back to even earlier. Shumailov et al., although more recent, may demonstrate model collapse from training on synthetic data, but the extrapolation from models generating training data for models vs. models verifying model output requires some kind of justification, as those are different task categories.
I’m getting the impression that they’re grabbing whatever looks adjacent and folding it into their very shaky theorizing.

Prompt: And actually a negative proof-of-concept for their whole thesis :D

Prompt: Pretty shocking, eh? Suggests that maybe they didn’t even bother to check out some of these load-bearing arguments with one of the four models they cited. I don’t use Grok, but I’m getting solid help from you three.

Thanks for reading! This post is public. Feel free to share it.

My Thinking A.I.des

Discussion about this post

Ready for more?