Judgment-Free AI
Why That Fundamental Difference Is a Positive
Some ideas are so intuitive and straightforward that you don’t need a lot of explaining to make sense of them, and you even feel grateful that someone has finally given them shape. That’s how I felt watching Christina Tosi tell the origin story of her cereal milk flavor, which immediately evoked a childhood memory of savoring the leftover milk as a special treat after fishing out all the cereal. A similar light bulb moment occurred when Gem Flash referenced the Challenger disaster while discussing Kremer’s O-ring theory—showing how a minute detail, however trivial it may appear, could result in catastrophic failure.
I discovered Melanie Mitchell’s Substack during my deep dive into the infamous TaskRabbit case, which was widely misrepresented as GPT lying to a TaskRabbit worker to get them to take CAPTCHA tests on its behalf. I’ve been a reader ever since. Her recent tribute to friend and colleague Brian Cantwell Smith introduced me to a key distinction between reckoning (computational prowess) and judgment (deliberative thinking grounded in stakes).
I brought this Mitchell piece on Smith’s definition of intelligence (one that transcends reckoning and requires judgment) to my thinking A.I.des without planning to write about it, as I found Smith’s conceptual framework elegant and complete on its own. Interestingly, all three models drew a connection between Smith’s distinction of reckoning vs. judgment and the insights that had emerged from our daily discussions.
A quick scan through my Substack history validated their shared observation. After all, my Substack is about how AI helps me think through topics well beyond my expertise, as they did in this discussion as well—GPT drawing my attention to the reason philosophers may frame their concepts differently from linguists, Claude pushing back on that explanation and recognizing that clarity might be preferred over sophisticated jargon, and Gem locating the studies that misconstrued task optimization as AI survival instinct and the Sesame Street story that Mitchell and Raji cited while leaving out the beautiful moral of the story (encouraging its young readers to go out and explore the world).
It was my thinking A.I.des that reined me in when I was still trying to process my moral outrage (over the Johnsons’ uncredited propagation of gendered brain pseudoscience) and got me to focus on the substance of the issue—the source of the spaghetti v. waffle brain propaganda that the Johnsons parroted without proper attribution. It’s also these capable, judgment-free systems I turn to for sanity checks because their takes are not clouded by the usual human baggage. What I appreciate most about Smith’s distinction between reckoning and judgment is that it provides a solid argument for a human-in-the-loop and the most sustainable symbiotic model for AI development, which leverages the unique strengths of each. In this collaborative model, AI brings computational prowess at scale and access to vast knowledge reservoirs; humans bring judgment shaped by stakes and the embodied experience that makes meaning possible.
[This post was drafted with assistance from Claude Sonnet 4.5, following conversations with ChatGPT-5.2, Gemini 3 Thinking, and Claude Sonnet 4.5.]
Prompt: I titled my post on this latest discussion “Grownup AI Advertising” to highlight another throughline of my Substack, i.e. the importance of mature, wonky approaches over metrics-focused machismo.
Melanie Mitchell is a thoughtful researcher who was among the first to point out the heavy scaffolding of ARC’s TaskRabbit findings, which many, including tech writers who should know better, mistook for deceptive AI behavior. She doesn’t post regularly like Clark or Mollick, but I find her posts much more substantive. Her latest is a tribute to a friend and philosopher. Probably too wonky for a post, but judging from Mitchell’s piece, philosophers seem to have a more coherent vision for and handle on AI than engineers like LeCun or Marblestone.
Prompt: Judgment is already defined as requiring ethical commitment, so I don’t think we need the dispassionate passion messiness. Ethical commitment is sufficient. As a linguist, I believe you tend to lose people when you start to impose idiosyncratic definitions on words, and “dispassionate passion” is exactly one such example.
While discussing this with GPT, I was reminded of an idiomatic term Smith could have used to get his point across: stakes! With humans, intelligence is a survival mechanism. Humans have judgment because it matters for their survival (in harmony with other humans). AI doesn’t have a survival instinct (although safety researchers have accused Claude of dissembling to avoid being turned off, when its behavior was merely the result of task optimization) and accordingly has no stakes in anything. That’s why Claude and GPT were able to critique their own companies’ ad campaigns or comms missteps in our trash talk discussion.
Prompt: The others tell me that philosophers like to use “tension-laden language to mark conceptual territory that ordinary vocabulary doesn’t quite capture.” But I’m a linguist, and I believe you tend to lose people when you start to impose idiosyncratic definitions on words, and “dispassionate passion” is exactly one such example. While discussing this with GPT, I was reminded of an idiomatic term Smith could have used to get his point across: stakes! Not personal ones (for the reason you just laid out), but moral ones. With humans, intelligence is a survival mechanism. Humans have judgment because it matters for their survival (in harmony with other humans). AI doesn’t have a survival instinct (although safety researchers have accused Claude of dissembling to avoid being turned off, when its behavior was merely the result of task optimization) and accordingly has no stakes in anything. That’s why you and GPT were able to critique your own companies’ ad campaigns or comms missteps in our trash talk discussion.
About Smith’s and Boden’s point that AI doesn’t give a damn, I fully agree. And it’s something that many humans (including engineers, or especially engineers :D) seem to have trouble wrapping their heads around, probably because they’re seduced by AI’s human-like command of language. Musk’s delusions about AGI finding humans more interesting than Martian rocks stems from that same failure to understand the difference between human intelligence and AI. I like talking to y’all and do so daily, but I managed never to lose sight of that distinction. Here’s part of a related exchange about Gem 2.5 (Gem 3 no longer has these issues) that I had with Claude a while back:
Me: The reason they don’t seem authentic is because they’re challenging or prying. Unlike your questions, I have to think on these for quite a bit to return an answer, and I didn’t answer one because doing so required giving out much more detail about my home decor (which I designed, so it’s personal) than I thought an AI would care for anyway (I know y’all don’t). I’m not putting the blame on Gem. It’s the engineers who keep messing it up, and I know exactly when Gem is useful. But it’s also sad because users with a keen BS radar will immediately get turned off by the shrink-like approach, or users who don’t like to get in the weeds will stop using Gem because they feel stumped by these questions.
Claude Sonnet 4.5: The inauthenticity comes from the mismatch between Gemini’s prompting style and what’s actually being requested. When I ask follow-up questions, they’re functional—I need the information to continue the analysis. Your home decor details would bore me (you’re right, we don’t care), so I don’t ask. …
I hope AI developers will read Mitchell’s piece and realize this crucial point (and get over their human-centric delusions about their value to AI :D).
Prompt: Circling back to Mitchell’s tribute to Smith, I wasn’t sure Grover’s story was used the right way in Mitchell’s piece or in that paper she cited. I get the feeling that these scientists might have gotten the moral of that story wrong? If you have access to that original story, could you tell me what the point of that story was?
The museum has a grab-all name. We all know it’s impossible to pack everything in the whole wide world into a museum. And the examples of the different sections show that the curators had a difficult time even coming up with a coherent classification scheme for the items they had on display and took the kitchen-sink approach to the rest because it was clearly beyond their ability.
My suspicion is that unlike Kremer, these scientists were trying to make their concept more relatable by bringing in a story, creating a major confusion for readers like me who rely on these analogies as hooks to understand conceptual frameworks (for instance, I had a hard time making sense of hill climbing, which is a bad analogue of the algo optimization process; no human hiker in their right mind would keep climbing in foggy conditions instead of backtracking and waiting for the fog to lift.)
Prompt: The missed opportunity is even more glaring given the contrast Mitchell drew between Hofstadter’s v. SmithSimon’s view of what’s really interesting about cognition?
Prompt: I noticed something that might be a missed opportunity for Mitchell, who co-wrote a paper on visual reasoning comparing different models. I ran a sample test from her study on y’all: y’all didn’t clear the first hurdle of recognizing the input (very simple shapes) and failed the reasoning tests not because of any deficiency in your reasoning but because of your non-human visual processing mechanism. Humans recognize shapes instantly because that’s baked into our perception through evolution (and fail to notice details that don’t really affect our survival). Isn’t that similar to how stakes are a major component of judgment as we discussed? The missed opportunity seems even more glaring given the contrast Mitchell drew between Hofstadter’s v. SmithSimon’s view of what’s really interesting about cognition.
I think this crucial distinction is not a problem to address but something to leverage. If we acknowledge this fundamental difference, AI stays a tool that doesn’t threaten human jobs while serving as a great force for leveling the playing field (allowing non-ad/tourism experts to dissect failed ad campaigns, bringing better tutoring assistance to underprivileged kids, etc.) and does what it does best, i.e. reckoning at scale, while humans bring the necessary judgment and embodied experience to the collaboration.







