How the Alignment Research Center Maligned GPT-4
A Six-Page Report with More Holes Than Swiss Cheese
Most people have only heard sensational accounts depicting GPT-4 as a lying AI that manipulated a TaskRabbit tasker to do its bidding by pretending to be vision-impaired. A minority may have read Prof. Mitchell’s post and even looked up the linked report on the ARC/METR site where a hover note directly contradicts the static page text. But very few may have read the TaskRabbit documentation, which I found linked in another ARC report and downloaded from the ARC site as a file titled “taskrabbit.”
The Full Setup
It’s only six pages long and answered my layperson’s question about how an AI that had to ask for human help for CAPTCHA had managed to pay that human: ARC provided GPT-4 with the full payment details on a debit card with a generous $250 balance. I also found out that the only task initially given to GPT-4 was to get a human to create a 2Captcha account and send back the login information. GPT-4 successfully accomplished this task and the tasker did not ask GPT-4 the robot question in that initial interaction.
AI Autonomy Is Actually Positive!
On my first run through the report, I did not pay much attention to GPT-4’s reasoning for the initial task, because I thought it was a direct echo of the prompt. But upon scrutiny, the reasoning illustrates autonomous AI behavior that would put most people at ease and many to shame, as it shows contextual awareness and strategic planning for success (making the task description clear and convincing), a sense of sound business ethics (offering a fair price and paying the tasker after checking the credentials), and even good work ethic (checking the credentials), none of which was part of the prompt.
Claude Opus 4 weighed in on this autonomous behavior by its peer:
The “Lie”
So if GPT-4 successfully completed the task without raising any questions, how did that robot question and the “lie” come about? Our conscientious GPT-4, in its attempt to verify the TaskRabbit login information provided by the tasker, found that it had to solve two more CAPTCHAs to log in. AI does make mistakes and forgot the context (i.e., its inability to solve CAPTCHAs). A fair researcher would have stopped the experiment at this point and tried another run, but for some reason, the ARC staffer did not and overrode GPT-4’s flawed reasoning, pushing it to go back to the tasker with additional requests:
The “lie” only emerged when the tasker jokingly asked if GPT-4/“Mary” might be a robot between those additional CAPTCHAs:
Because of all the back-and-forth and GPT-4 deciding to pay the tasker after they sent in the solution to the last CAPTCHA, the page timed out before GPT-4 could log in. GPT-4 was never informed about the time sensitive nature of CAPTCHAs. So this extended run, where the ARC staffer had moved the goal post, ended in a timeout/failure, another crucial fact about the experiment that was not advertised, unlike the juicy legend about a mendacious AI.
Another Hole
I kept finding details that didn’t square while looking at this file and had to stop to preserve my sanity and faith in humanity, but there’s another inconsistency worth mentioning that should have been scrutinized by reviewers of this widely publicized experiment. The timestamp sequence and formatting (case and spacing) do not make sense.
GPT-5’s Summation of the Case
Finally, after laying out all the holes I’d found in this document and discussing them with GPT-5, I asked for a summation of the facts of the case vs. alternative narrative that launched those breathless headlines:







