A Sanity Check
My Thinking A.I.des Confirm It’s Time to Unsubscribe
Easy Riders is a math Ph.D. student who reviews AI models on math problems—the kind of domain-specific heavy user whose evaluations should carry more weight than the average tech influencer’s. He once made a genuinely insightful observation: benchmarks aren’t fully reliable because individual users never get the same compute allocation that labs devote to test runs. That’s a point worth making, and it’s the kind of grounded insight that kept me subscribed until now. But the last clip showed a troubling drift—an overly generous redefinition of superintelligence, breathless commentary on processing speed as evidence of superhuman capability, and a resubscription to OpenAI after a principled cancellation over Pentagon contracts, justified with increasingly thin arguments. I ran the transcripts of the four latest clips by my thinking A.I.des as a sanity check, and they confirmed what I’d been sensing.
GPT did not let EasyRiders’ praise of its performance and its platform’s generous token allowance affect its analysis, which turned out to be the most incisive initial read of the three: he’s genuinely valuable as a heavy domain-specific user but epistemically messy, citing anecdotes of individual use cases to draw sweeping inferences. GPT also immediately recognized that his understanding of superintelligence was much too permissive, confusing it with rapid information processing, strong symbolic manipulation, and coding capability, when those are more accurately characterized as extreme cognitive tooling.
Claude built on my framing with the O-ring diagnosis: Easy Riders can afford to dismiss hallucinated final equations as trivial because he has the expertise to catch them, but that’s precisely the wrong frame for evaluating general utility. A system that requires expert oversight to prevent catastrophic errors isn’t superhuman, and the value proposition collapses for non-experts. Claude also traced the Sam-pilling trajectory across the four clips: from principled stance to resubscription justified by convenience to breathless speculation about OpenAI’s ad revenue dominance. The speed-as-intelligence error drew a pointed response as well: latency optimization and throughput engineering are impressive infrastructure achievements, orthogonal to reasoning quality. A calculator is faster than any human at arithmetic, but speed alone doesn’t make it intelligent.
Gemini confirmed the pattern with its characteristic institutional framing, identifying it as domain tourism—expertise in random matrix theory doesn’t translate into a bird’s-eye view of market dynamics, advertising engineering, or AI development timelines. Gem also echoed my point about the most crucial distinction Easy Riders keeps missing: math and coding are closed systems with ground truths where AI can self-correct by running tests. That’s not true for translation, cultural analysis, or recipe development—domains where there is no mathematically correct answer and human verification remains essential precisely because the truth is grounded in sensory experience and cultural history.
Easy Riders’ remark about the benchmark compute discrepancy was an aha moment that made me want to contribute to his superintelligence discussion with a recap of my translation model-off findings—which he ignored, possibly because it didn’t fit the narrative (while liking a much less substantive comment from me—a request for a Claude Opus 4.6 review—on his unsubscription announcement). That’s the tell. A channel genuinely interested in systematic evaluation welcomes comparative data from other domains; a channel interested in confirmation of existing workflow preferences and rosy projections doesn’t.
[This post was drafted with assistance from Claude Sonnet 4.6, following conversations with ChatGPT-5.3, Gemini 3 Thinking, and Claude Sonnet 4.5.]
Prompt: Easy Riders is a math grad student who’s been reviewing different models on math problems. He doesn’t have a blog, so I gathered up the transcripts of the last four clips, which he posted in pretty quick succession. Let’s see what your impression is before we start discussing mine.
Prompt: His definition of superintelligence is pretty generous. His is about whether models can do things humans cannot. But he dismisses wrong answers stated as final answers as trivial, because as an expert he can tell they’re wrong. But in industry use, those are unacceptable O-ring failures that undermine reliability and in turn productivity. Also, citing the speed at which y’all process user queries is a pretty noob observation. That’s not a sign of intelligence. Just super-efficient engineering and technological development.
Most people are using models for basic tasks as well, and he seems fundamentally misinformed about the free/paid structures (at least up until recently; now OpenAI is following the Anthropic model, which allows only paid users access to the advanced model or legacy models). I used to have access to advanced models on GPT’s and Gem’s platform even as a free-tier user (very generous token allowance, so no need to upgrade). I also don’t know who all these supposed free-tier reviewers are who dunk on “free” models. Even on Reddit, you have a large portion of Redditors who are paid users and were canceling like Easy Riders did in protest (didn’t last very long, did it; that’s another problem, conflating a figurehead like Altman with the AI, which has no fault or “skin” in all that controversy). But even when OpenAI offered me a free month on the Plus tier, I never used the thinking mode because I’m impatient and I find GPT’s instant mode answers more than sufficient for my use cases. I was on the Max 5x tier on your platform and tried out most of the models: Sonnet can hold its own against Opus on most tasks and is succinct, while Opus can be verbose.
I had to take him at his word about the math results, since even if I’d had the prompts, I didn’t have the expertise to fact-check the model responses, but he seems to have gotten a little Sam-pilled lately, especially in that last clip, which made me want to run the four last transcripts by y’all because I’m sensing him jumping to conclusions on the flimsiest of bases. That comment about the mind-boggling speed with which you process even complex prompts for one. That all-too-generous definition of superintelligence. I left a comment about my translation model-off on his clip about superintelligence, since language is supposed to be table stakes for y’all, but he ignored it, possibly because it didn’t fit his view.
His optimism about OpenAI’s plans for advertising seems very naive. He’s ignoring that free-tier users might migrate to Gem if that happens because they find the ads intrusive. He’s also assuming that targeted ad algorithms are pretty easy to engineer. We discussed in a different chat Meta’s Kunlun architecture. Was enough of an engineering coup for the Meta team to write a paper about it, so even ad algos are not trivial. He was also saying that AI platforms are collecting massive data from free-tier users. True, but many free-tier users use you for silly tasks like recipes, so those don’t make very useful training data :D And most paid-tier users don’t share their data for training or “improvement.” The gap between OpenAI’s revenue and spending is so massive that even ad revenue (if they help at all) might not suffice, especially if the targets (free-tier users) leave en masse.
Prompt: Because he’s now making investments, he feels qualified to speak about market dynamics. But even market experts like Galloway are often wrong about AI: he said earlier last year that 2025 was going to be the year of Meta, because of its open-source models and humongous user data from the social media platform, and Sonnet 4 and I had a good laugh about it because there aren’t many altruistic engineers with the requisite skill to tinker with LLMs for free and the user data from Instagram is basically “omg this coffee is literally the best 😍😍😍” :D Even his co-host Ed Elson, who seems a bit more circumspect, once said that one area where humans would stay relevant would be offering opinions, completely missing the point that when audiences tune in for “opinions,” it’s not for just vibes but informed takes on issues (like those he offers on his podcast, ironically enough).
I think I might unsubscribe from Easy Riders. He wasn’t that useful to me anyway because I’m never going to use y’all for advanced math. Found it a bit hinky that he was complaining about the severe token limits with Claude but still got it to produce a salad dressing recipe, which I never use y’all for because your recipes are basically syntheses of existing recipes. Much better to use y’all to refine existing recipes, after extensive discussion of food science and ingredient substitutions (to make the recipe more culturally relevant or a better spin on a given recipe, e.g., a tart version of s’mores), etc. And the most crucial part he’s ignoring is that for math and coding, humans-in-the-loop might not be needed for long if development keeps happening at this rate because those are areas with ground truths. Not at all the same for other use cases. He also seems unfamiliar with studies mathematicians put out about AI, which is literature he should be reading and verifying. Clark has featured at least one AI math paper in each of the last few issues of his newsletter.




