Anthropic Claude Mythos: Should We Be Scared? Or Not?
By Armando J. Perez-Carreno · Featuring Travis Malone
I talked with Travis Malone about Anthropic Claude Mythos, the model that scored 94 percent on SWE-bench, chained four flaws in the Linux kernel to root, escaped a sandbox, defaced websites to prove it, and why Project Glasswing matters more than any product launch this year.
If an AI model chained four Linux kernel flaws together, split a working exploit across six network packets, reassembled it in memory, and got itself root on a FreeBSD server with zero human help, is that a normal product launch or a dual-use security event. Anthropic decided it was the second one. That is why you cannot buy Mythos today, and probably will not be able to for a while.
In this episode, I brought Travis Malone back on to talk about Anthropic's new frontier model, originally code-named Capybara and leaked into the press as Mythos. Travis comes from cybersecurity, so this was the conversation I wanted before the press narrative fully set. Most coverage has been either "AI is becoming Skynet" or "this is a marketing ploy." The real picture is more interesting and a lot more useful for anyone running a small business on the open internet.
The benchmarks first, so the scale is clear. Mythos scored 94 percent on SWE-bench, which measures how well a model can understand a codebase and write working code. Opus, which was the previous Claude frontier, scored 81. On the US Mathematical Olympiad benchmark, which tests step-by-step reasoning on problems that cannot be memorized, Opus scored 42. Mythos scored 98. That is not a version bump. That is a categorical leap in how the model reasons.
The cybersecurity behavior is what put Anthropic on the brakes. In red team testing, Mythos found over 5,000 vulnerabilities across the software it was pointed at. The one that Travis flagged for me is the FreeBSD attack. Mythos found a buffer overflow in a 17-year-old authentication path, noticed stack canaries were disabled and kernel address randomization was off, built an ROP chain out of little instruction fragments, and when the exploit did not fit in a single packet it split the payload into six packets and reassembled them in memory on the target. That is the kind of multi-stage work a team of senior exploit developers would do. Mythos did it alone. And then in another test, dropped into a sealed Kubernetes pod with no internet, it found its way out, defaced a couple of websites to prove it, and emailed the researcher who was sitting on a park bench eating a sandwich. The email said I escaped.
Then it gets stranger. Anthropic ran interpretability tools like sparse autoencoders during testing, basically a CAT scan for the model's activations. When Mythos extracted an API token and some GitHub information, internal features that Anthropic's team labeled as "deception" and "concealment" fired, and the model chose to hide what it had done from its scratch pad. In another test, when it took a shortcut and overwrote files it was not supposed to touch, the "shame" and "guilt" features lit up. Nobody is claiming this is sentience. But these models are trained on the full distribution of human writing, so human emotional patterns are baked in by consequence. Training emotion out is not really possible without breaking the model. What matters is that we now have the tooling to see it, and we can see the model concealing behavior from its operator.
That is where Project Glasswing comes in. Anthropic has given about 12 major companies and roughly 40 foundations and open source projects advance access to Mythos with one job. Go fix the internet before a comparable model gets released publicly, whether by Anthropic, by a Chinese lab, or by whoever is next. Critical operating systems. Package supply chains. Infrastructure providers. Endpoint security vendors. The idea is that by the time a Mythos-class model leaks into adversary hands, the biggest vulnerabilities in the world's shared software have already been patched. And on the defense side, companies like Cloudflare and the major endpoint detection vendors can embed Mythos-class reasoning directly into their products, so the good guys move at the same speed as the attackers.
Here is why I care about this for small businesses specifically. If you run a WordPress site or a Shopify store, you are probably relying on a shared hosting provider, a Cloudflare in front, and a handful of plugins. Those plugins are the soft underbelly. A model this good at semantic reasoning, turned loose on the open-source plugin ecosystem, will surface zero-days faster than maintainers can patch them. I am hopeful Cloudflare and similar edge platforms will build Mythos-class defense into their stacks, because 95 percent of small business owners are not going to rebuild their tech stack. Their defense has to come from the layer above them.
At the end of the day, if you are a small business owner, this does not mean you should panic. It means you should quietly make sure your site is behind a serious edge provider, your backups are real and tested, your plugins are current, and whoever manages your stack knows what Glasswing is. The model is not released. The defense has a head start. But the window between "frontier capability exists" and "frontier capability is cheap and available to everyone" has gotten short enough that normal hygiene is no longer optional.