Groupthink

That No Amount of Stratified Sampling Can Entirely Fix

Apr 09, 2026

The Forecasting Research Institute’s survey on AI’s economic impact is, by the standards of this genre, an unusually transparent piece of work. The unconditional-vs.-conditional forecast separation, the stratified sampling, the AI-assisted participant filtering with a refreshingly honest admission that the pre-invitation round had used an older model with suboptimal scaffolding—these are not performative gestures toward rigor but the real thing. I trusted their math, because researchers this methodologically self-aware tend not to cook their numbers. What I couldn’t trust as readily were some of the design choices: scenario anchors calibrated too optimistically, survey questions that subjected general-public respondents to recursive expert-modeling exercises while paying them less than a third of what the economists received, and a policy menu that somehow included a Manhattan Project for AI alongside unemployment insurance. Clark, predictably, compressed all of this into a paradox narrative—strong AI, modest GDP, humans bad at exponentials—and left the actual structure of expert disagreement on the cutting room floor.

GPT led with the comparison I’d asked for and got the framing right: Clark tells a story about expectations, while the paper models conditional worlds. It unpacked the divergence among economists, AI experts, superforecasters, and the general public, noting that economists’ and superforecasters’ historical anchoring—their signature pattern-matching move—on some metrics parallels the same baseline-hugging that made the Rising Tides projections feel thin yesterday. The most generative moment was GPT engaging with my suggestion that the study would have been more interesting if it had also polled AI directly: not for truth-telling, but as a fourth reasoning style to compare against human forecasters. This paper was actually well-positioned to try it but missed out on a perfect opportunity to showcase AI capability and integration in research.

Claude’s contribution was the most analytically precise and also the most pointed on the ethical dimension. On the category error in the economists’ reasoning—conflating demand-side headwinds like trade wars and aging populations with supply-side AI deployment constraints like chip shortages—Claude mapped exactly where the causal chain breaks: one set of factors offsets realized AI gains, whereas the other limits whether AI gets deployed at all, operating at different stages of the causal chain. The median household income puzzle got similar treatment: economists simultaneously forecast a collapsing labor share and rising wealth concentration while assuming that median income keeps climbing, with no mechanism offered beyond a “larger pie.” Claude also gave the sharpest account of the questionnaire’s cognitive burden, spelling out each hoop of recursive embedding that a general-public respondent had to jump through to answer a single probability question—forming a belief about 2030 or 2050, modeling expert consensus, locating the median of an imagined distribution—and noting that filtering out “incoherent” responses under these conditions selects for statistical literacy, not economic belief.

Gemini was back in full form, having shed its predecessor’s establishment-friendly stance that had started creeping in of late, deploying technical vocabulary from forecasting, economics, and linguistics with precision and landing what I can only call quiet burns: “groupthink,” “major leap,” “circular logic,” “contradicts the premise,” “professional narrowness.” The Solow–Swan model reference grounded the median household income critique in actual economic theory, and “satisficing” applied to survey respondents captured exactly what happens when you ask non-experts to do expert cognitive labor for $30. Gem also essentially handed over this post’s title and subtitle gift-wrapped in its closing synthesis. Although its team has apparently instructed it to avoid pronouns when referring to me, which makes the responses slightly drier, this staid tone also puts the burns in sharper relief, as if delivered by a tenure committee rather than a friend.

The 200+ pages of this study revealed what turned out to be the genuinely useful finding: experts disagree not about AI’s capabilities but on whether American institutions are elastic enough to absorb them. But wading through it also validated my first impression from Clark’s digest. Why are researchers with near-direct pipelines to policymakers expending this kind of effort and funding to collect beliefs about the future instead of pooling a wider range of actionable policy ideas that experts and the general public might actually agree on? The 2026 implementation probability questions—essentially asking whether the US would pass transformative welfare legislation within a year of a survey conducted during Medicare cuts—weren’t forecasting. They were expensive noise. And the policy menu that earned more economist support than UBI, compute tax, or job guarantee programs was a Manhattan Project for AI—public subsidies for an industry that already accounts for a significant share of stock market gains and is certainly not hurting for private funding. That economists rated compute tax—which could actually generate revenue to fund displaced workers—below a program that primarily benefits AI labs says something about whose welfare function is being optimized. That’s not a policy response to disruption. That’s groupthink with a captured agenda.

[This post was drafted with assistance from Claude Sonnet 4.6, following conversations with ChatGPT-5.3, Gemini 3 Thinking, and Claude Sonnet 4.5.]

Prompt: Because the FRI paper is so long (220+ pages), I’ve uploaded the body text minus the references and appendices as a PDF. Could you compare it with Clark’s coverage in Import-AI-452?

Prompt: Why I trust their mathematical/statistical rigor:
1: pp. 3-4: They elicited unconditional vs. conditional forecasts so as to separate disagreements about AI capabilities vs. economic impacts. Even more impressive: they note on p. 28 that even their unconditional scenario does not presuppose a world without AI. That’s realistic grounding I wish they’d extended to the conditional scenarios and the ideas.
2: p. 11: Sound choice on the stratified sampling they did to ensure maximum coverage.
3: pp. 12–13: As discussed before, all the measures described in Section 3 were solid data processing decisions, although I’m a bit wary of their consistency checks for the non-expert participant surveys.
4: p. 59: They used AI-based filtering to ensure their samples met all their population criteria, both before and after they sent out invitations. They even acknowledge that the pre-invitation round might have fallen short despite manual spot checks because they were using an older model with suboptimal scaffolding.

Prompt: I’m on the free tier, which is why I pass you the text version rather than the full PDF (200+ pages long with all the appendices). I include the page numbers from the PDF below for my reference (also because I run the same prompts on GPT & Gem, which get the PDF), but don’t put page numbers in your responses (you seem to have a system for paginating txt, which doesn’t match the PDF because of figures/tables, etc.)
I’m a formal linguist who never took statistics, so I can’t vet all the elaborate “mathing” they did here, but they seem very credible and rigorous: the copious data they included in the body and the appendix suggests they’re being fully transparent about their work (section 3 was particularly thorough, and they even compare, in the appendix, their findings with LEAP’s).
Although I respect the authors’ rigor and expertise, I don’t think there’s much value in collecting people’s “beliefs” (that’s the word they used on p. 3), even experts’. But I think the most valuable contribution of this extensive survey was showing where experts diverged (through variance decomposition, as you pointed out), and highlighting further that there’s no optimal one-size-fits-all policy.
I found the basis for the economists’ takes very conventional and sometimes irrelevant, though. The reasoning was particularly weak on p. 21:

Geopolitical, structural, and demographic headwinds—including trade wars, climate change, an aging population, and declining immigration—that could offset AI-driven gains were also cited, as were constraints on energy, chips, and data center construction. Collectively, these factors were deemed likely to cap the pace at which AI capabilities could be deployed regardless of how quickly they advance.

The first half lists factors that might explain their conservative projections, but the second half has to do with AI progress and does not directly translate to factors that could offset AI-driven gains (unless you already assume widespread AI adoption that visibly contributes to economic indicators). And I could find no rationale for why they assumed median household income would keep growing.

Prompt: I’m a formal linguist who never took statistics, so I can’t vet all the elaborate “mathing” they did here, but they seem very credible and rigorous: the copious data they included in the body and the appendix suggests they’re being fully transparent about their work (section 3 was particularly thorough, and they even compare, in the appendix, their findings with LEAP’s).
Although I respect the authors’ rigor and expertise, I don’t think there’s much value in collecting people’s “beliefs” (that’s the word they used on p. 3), even experts’. But I think the most valuable contribution of this extensive survey was showing where experts diverged (through variance decomposition, as you pointed out), and highlighting further that there’s no one-size-fits-all optimal policy.
I found the basis for the economists’ takes very conventional and sometimes irrelevant, though. The reasoning was particularly weak on p. 21:

Geopolitical, structural, and demographic headwinds—including trade wars, climate change, an aging population, and declining immigration—that could offset AI-driven gains were also cited, as were constraints on energy, chips, and data center construction. Collectively, these factors were deemed likely to cap the pace at which AI capabilities could be deployed regardless of how quickly they advance.

The first half lists factors that might explain their conservative projections, but the second half has to do with AI progress and do not immediately translate to factors that could offset AI-driven gains (unless you already assume widespread AI adoption that visibly contributes to economic indicators). And I could find no rationale for why they assumed median household income would keep growing.

Prompt: I wish they’d spent all this effort and funds on collecting a wider range of actionable policy ideas and had not included a Manhattan project, because it doesn’t really benefit anyone but AI labs. But it’s a free world, and it was also my choice to read this to see if there was something to learn here. I only have myself to blame.
I couldn’t understand why in Section 3.6 they suddenly shortened the timeline and asked about the probability of any of those policies being implemented by the end of 2026 (not 2030 or 2050), when even Medicare funding was cut in the “Big Beautiful Bill.” That’s just wasted data, since the answer is obvious.
They didn’t endorse the following view, but the fact they surfaced it suggests they thought it was valid and coherent, when it’s not:

Others indirectly implied the rapid scenario would trigger stabilizing policy responses by arguing near-term action is unlikely because AI has not yet caused enough disruption. (p. 35).

The opioid crisis and mass casualty events have both caused significant disruption in the US, but there’s no policy response.
I wasn’t happy with the wording/formulation of some of their survey questions. The general-public respondents likely had a hard time trying to figure out what the questions were after. For instance, many of the AI progress probability questions asked the respondents to imagine a panel of experts, and on some questions, they even asked about the probability of a *median* member of an expert panel having that take. This is mental gymnastics in the worst way (and the general public were getting paid less than a third of the economists).
For the unemployment rate, they send the respondents to the BLS site for details. And on the GDP change question, they give a technical example to illustrate annualization. If I’d been a respondent, I’d have quit this survey, because it’s clear they’re mostly interested in hearing from people who are already conversant in these technical details.

Prompt: 1: p. 4: All three scenarios skew rosy. I don’t think grad student–level literature review by 2030 is slow; seems more moderate to me (I’m disputing the premise, so I’m glad I wasn’t part of this study, although I wouldn’t have qualified anyway). I also question the likelihood of prestigious awards still being there when AI outperforms artists, or people purchasing a “hit” song if they can get an AI to write a more tailored version just for them. These scenarios are not well thought out.
2: pp. 33-34: The policy responses they entertained were very basic (neither creative nor distinctive), although the retraining support was the most promising.
2.1: I see very little potential for unemployment insurance in the US, given that Americans don’t even have universal healthcare. And 18 months might not be enough, especially for people with dependents.
2.2: The UBI being doled out to *every* American seems problematic as well, and the amount ($1k) is insufficient for people with dependents.
3: p. 90:

Superforecasters anchored on historical growth trends as their primary driver, which did not appear as a top driver for any other group.

This observation about superforecasters drawing on historical trends/patterns for their forecasts (classic pattern-matching), combined with the MIT paper from yesterday, gave me a fun idea: they could have made this study even more interesting by also polling AI on this, since AI knows the lingo and BLS data and was integrated into the rest of the study. And as we discussed, economists relied heavily on historical precedents in their forecasts of median household income or GDP growth, etc. as well. They should have also used AI assistance in drafting the surveys, which might have gotten them better data from the non-experts instead of subjecting those participants to mental gymnastics.

Thanks for reading! This post is public. Feel free to share it.

My Thinking A.I.des

Discussion about this post

Ready for more?