24/7 Agents?
When Background Processing Becomes Compulsive Labor Multiplication
Jack Clark’s latest newsletter opens with mythopoetic prose about agents working on his behalf while he hikes at dawn. Its tone reminded me of STEM undergrads thinking they can outdo humanities students at metaphor by invoking gods and fog. Good metaphors illuminate through apt comparison; these just dress up “I have AI running in the background” in language that obscures rather than clarifies what’s happening. More concerning is what the post reveals about AI deployment philosophy: Clark’s considering “lieutenant agents” to delegate delegation triage to rather than using agents to verify other agents’ outputs. That’s optimizing for throughput when the real bottleneck should be quality control—exactly backwards from what users who understand O-ring dynamics would prioritize.
The most striking passage was one where Clark admitted feeling “guilty that I haven’t tasked some AI system to do work for me while I play with my toddler.” Claude characterized this as “compulsive labor multiplication driven by fear of wasting potential rather than actual need,” and I think that’s exactly right. As Gem put it, this isn’t productivity optimization; it’s treating human existence as resource to be maximized rather than lived. Meanwhile, paying Claude users may be employing abbreviations and every token-saving trick like I did back when I was a Max 5x user ($100/month + taxes) because Anthropic instituted weekly limits last August specifically to address background processing and policy violations such as account sharing. The usage asymmetry is galling: insiders externalize cognition at scale while customers get performative access. Paying users will resent this, and rightfully so.
GPT made an insightful distinction about why Clark’s centaur mathematics example doesn’t generalize: math has proof-checkers that can verify correctness with zero error rate. Legal arguments can’t be compiled for validity; they’re persuasive or fail to be to particular judges or jurors based on precedent interpretation, factual disputes, and social consensus. That’s why the Korean lawyer’s unverified 17,000-transaction analysis was alarming, while Ms. Kim’s iterative evidence classification succeeded: she maintained verification responsibility while the legal professional abdicated it.
An earlier issue of Clark’s newsletter on Google’s Consistency Training also contained a factual error that caught my (and some of my thinking A.I.des’) attention: he reported that Consistency Training had been applied to their frontier models, when the paper explicitly stated it had been limited to Gemma and Flash models. This perfectly demonstrates the problem with offloading research synthesis to agents without verification: even someone with expert knowledge about AI limitations and development can publish misinformation because they trust agent summaries without checking sources. Summaries aren’t free cognition; they still require review to catch errors, and at scale that review becomes the bottleneck. If a tech expert can’t reliably vet his own agents’ outputs, what does that say about encouraging less technically sophisticated users to deploy agent fleets?
The newsletter’s Poison Fountain discussion about activists corrupting training data with junk is also far too narrow. As I stressed in my post on the need for rigorous data curation months ago, EPA statements that get immediate high-credibility ratings from Google’s RaterHub will now establish contested claims as authoritative facts in AI training. That’s systemic fragility, not activist sabotage. The most promising path forward isn’t autonomous agent swarms but expert-mediated AI, such as law firms using multiple models, bearing full liability for vetting outputs, and billing for verified certainty rather than raw legal-tech AI access. Firms that choose models, control data, and maintain verification discipline will retain fees precisely because judgment and responsibility are now the product. Clark’s vision of agents working round the clock optimizes for personal augmentation aesthetics; what scales socially requires institutional accountability and clear liability frameworks.
Despite these critiques, I value Clark’s newsletter for specific reasons that keep me from unsubscribing (which was Claude’s recommendation upon reviewing one issue that it found prioritized engagement over accuracy). It’s how I discover many topics I end up discussing with my thinking A.I.des: since AI is their domain of expertise, their insider perspectives help me evaluate developments and capabilities rather than just accepting marketing claims. The newsletter also gives me insight into what might be in store for Claude.
Perhaps most importantly, it provides a window into a sector with fundamentally different values from my academic background as a linguist and translator. I worked with “move fast and break things” people and left after only two months because I couldn’t stand the culture, but understanding how they think helps me anticipate where friction will emerge between their deployment visions and actual user needs. Clark’s dismissiveness about common user questions such as “how to roast a turkey and keep it moist”—equating them with one-shot prompts when the question clearly invited exploration of why breast meat dries faster than dark meat—perfectly illustrates this gap. Passive users ask for recipes, follow blindly, then complain AI led them astray. Users asking for the best way to roast a turkey want understanding, not just instructions. That distinction matters enormously for how AI gets designed and deployed, which is exactly why I keep reading even when I disagree with the framing.
[This post was drafted with assistance from Claude Sonnet 4.5, following conversations with ChatGPT-5.2, Gemini 3 Thinking, and Claude Sonnet 4.5.]
First Impressions
Prompt: What are your takes on the attachment?
ChatGPT-5.2
Claude Sonnet 4.5
The Models Discuss My Impressions
Prompt: As usual, I had many issues with this.
1. I think he’s trying to illustrate AI use for self-improvement and possibly work purposes, but I wasn’t impressed with the mythopoetic passages, which under scrutiny fall apart (spirits and fog? New Age BS. Good metaphors and similes work when there is some basis) and remind me of a STEM undergrad thinking he can outdo humanities students at their own “game” :D
2. He’s offloading so much work to agents that there’s no way he can vet the outputs. It’s interesting that he’s even considering lieutenant agents to delegate the delegation triage to, rather than setting up lieutenant agents to vet the outputs (which is a far better use of AI and might actually surface failure modes that stay under the radar of benchmarks or most usage modes and will lead to further model improvement).
3. I’m struck with how narrow his perspective is on the poisoned data. We discussed this before and I showed you two different statements put out by the US EPA because I was greatly alarmed that government websites that get an immediate high-trust rating from RaterHub will now be establishing what counts as facts. So it’s not just a matter of addressing activists tainting training data but the entire data collection approach that most AI companies currently use as well.
4. We discussed this earlier as well, but one promising industry is one where experts act as the bridge between AI and non-expert clients. These domain experts (e.g., law firms) will be the ones using multiple AI and letting those outputs inform their work and services that they then provide to the client. Under that model, the law firms bear full responsibility for vetting the AI outputs (as they are in the best position to do), and neither the AI companies nor the clients assume those risks. Since the law firms are accountable, they have justification to command the same fees they used to even if they employ fewer human lawyers, since they are the ones making the decisions of which AI to use, keeping the work product secure, and vetting the outputs.
5. There’s a math grad student who has a YT channel where he shows mostly GPT solving his math problems. I’m going to refer to him that collaboration paper, since he’d have the necessary expertise to vet it. It’s hard for lay people to know if Gem made significant contributions in this iterative process (very promising, and the only way to ensure the best output) vs. the one-shot offloading approach.
6. Most people are going to see this and feel resentment. Clark’s admitting that he’s using Claude mostly in the background processing mode, which Anthropic policy explicitly forbidsrate-limits users (including high-paying ones) for. And even Max 10x ($220/month including taxes) users can’t possibly get this much usage out of Claude, because it has the weakest infrastructure of the big three.
7. Some people might also wonder why on earth anyone (other than an influencer who wants to know every little thing that happens everywhere at a superficial level; those agents make mistakes, claiming that the Gem team applied Consistency Training to Gem’s frontier models, when it had not; Clark didn’t check the actual paper, although he’d linked it, and repeated that misinformation in his newsletter) would possibly want to overload Claude with all these readings to summarize. Even summaries are cognitively nontrivial to review (even when compiled by recap champ Claude), so the human in the loop’s ability to spot mistakes does come into play.
[Anthropic’s Jul 29 (2025) announcement mentioned background processing as one of the reasons for instituting weekly limits even for Max-tier users, but Anthropic does not ban background processing, according to its policy (retrieved Jan 31, 2026)].
ChatGPT-5.2
Claude Sonnet 4.5
Follow-Up Discussion with Gemini 3 Thinking
Prompt: Claude found that admission. I think that passage in particular is going to rub paying Claude users the wrong way: while they’re using abbreviations and every trick in the book to save tokens, the founder (I’m grateful to him for helping create Claude, but wrong’s still wrong) is wondering how to never give Claude a break. This seems like hogging of valuable compute that paying users would sure be grateful for.
GPT made an excellent point about why math was different from other domains. Not all fields are the same. I’m told there’s even a math app that can vet your proofs with a zero error rate (the math grad student YouTuber mentioned it and uses the app to vet model outputs, although he also does manual reviews).










