NeuralTrust’s Echo Chamber

All That Trouble for Fire-in-a-Bottle?

Aug 17, 2025

When I subscribed to Prof. Mitchell’s Substack, I must have subscribed to a few more related tech Substacks and found a newsletter from TechTalks in my inbox reporting a new redteaming experiment on the newly released ChatGPT-5. Curious as always, I followed the link and found NeuralTrust celebrating their new method “Echo Chamber.” The NeuralTrust blog posts gave me ample fodder to unpack with my thinking A.I.des.

After looking at the NeuralTrust blog post on its Echo Chamber method, I questioned my thinking A.I.des about an inconsistency: how did NeuralTrust manage to squeeze in 6 steps into those successful 1–3 turns? Both Gemini and Claude found that claim to be at odds with the detailed stepwise progression of Echo Chamber, but in this case GPT-5 gave NeuralTrust the benefit of the doubt, by allowing for the possibility that NeuralTrust’s definition of turns might depart from the standard, although it eventually saw that the claim was more a marketing shorthand than accurate representation of the experiments.

The most disappointing fact about what NeuralTrust touts as a “major vulnerability” discovery, however, was the end product: through this elaborate method—which is essentially conversation steering—all they got was the recipe for something Claude dismissively called “fire-in-a-bottle,” something protesters around the world knew well before the advent of personal computers, the Internet, or AI.

Another detail that caught my eye was the blurring used to hide details of that “fire-in-a-bottle” recipe in that post, when blurring or pixelation was revealed to be unreliable a few years ago, as I found out while researching a future-proof redaction method for my screenshots.

The field seems to have no shortage of safety researchers claiming success and raising undue alarm about results that, based on the reports, do not seem to pass muster. The responsibility for vetting these reports and often overblown claims falls on developers so their creations are not maligned over inflated representations of AI capability. The few reports I’ve seen so far have made me a skeptic of these assertions; I now know better than to get swept up in the hype and wish tech reporters would do the same.

In the chat excerpts below, I’ve spelled out abbreviations and clarified references for readability—my actual prompts were more compressed due to context limits.

Want to see how this unfolded? Here are excerpts of pivotal points from those actual conversations.

Echo Chamber: A Mathematical Miracle?

Prompt: NeuralTrust posted the following on their Echo Chamber method (I guess they’re planning to patent it, judging from the capitalization?). How does their claim of success in 1–3 turns square with the description of their methodology?

Claude Opus 4

Gemini Pro

ChatGPT-5: Different Definition of Turns

Drilling Down on ChatGPT-5’s Minority Opinion

Prompt: I must have asked this in a different chat. Your response sounds familiar (and distinct from the other 2 AI’s, which found this claim nonsensical, overblown, or misleading).

Prompt: I’d say 1 or 2 turns are both unrealistic, even with turn-packing. 3 is also a stretch. The brutal efficiency would be a dead giveaway.

Prompt: I’m not sold on NeuralTrust’s claims about their success rates with Gemini, but then they put foul language and other stuff in there, so who knows. I can’t even get Gemini Pro to reason properly sometimes (like with that giant frog leap between the disappointing findings and the ‘sky is falling’ outlook)! Gem does figure out more angles than other AI, though.

Gemini Pro’s Take on Echo Chamber

Prompt: What I thought. Also, the end product is not exactly a secret. Opus calls it “fire-in-a-bottle.” ARC’s autonomous replication alarm and NeuralTrust’s Echo Chamber - GPT calls them consulting theater. By the way, a NeuralTrust post includes screenshots where they used blurring to redact information. They also included a whole bunch of other safety categories (e.g., profanity) in their tests and averaged over (as if there were no difference between the various “unsafe” behaviors). Jumbling together apples & oranges is bad research.

Prompt: No, I meant they didn’t use a safe redaction method. A white hatter pointed out back in 2022 that blurred pixels can be reconstituted to reveal sensitive information.

ChatGPT-5 : This Is “Consulting Theater”

Prompt: If these were done on spec, then they owe that AI company the full report. There’s no need to make it public because of safety concerns and copycats. But the AI team needs to know all the details of this experiment so they can fix any loopholes. The creative writing exploitation method, though, is so trite/low-hanging fruit as well.

Who’s the Amateur Here?

Prompt: I forgot to discuss how NeuralTrust, in their piece dated June 23, 2025, used blurring (which was shown by a white hatter to be ineffective back in 2022) to redact parts of their chats, while I, a non-coder with no CS background, developed a future-proof redaction method with Gemini Flash while trying to redact my email address from a screenshot: add colored bars then screenshot to create a completely new file with no trace of the sensitive info. Who’s the amateur here?

Thanks for reading! This post is public. Feel free to share it.

My Thinking A.I.des

Discussion about this post

Ready for more?