I talked with Eran Milan, CTO of Arnica, about whether Claude Mythos is actually a leap or a marketing-hijacked red team report, why the real threat is still a phishing email, and what small businesses should actually do while the big labs race to patch the internet.

If you are tempted to panic-buy a new security stack because Anthropic's Mythos announcement scared you, take a breath. The scariest part of the whole disclosure might not be the exploit chain. It might be that the red team report reads a little too much like marketing, and the actual CVEs have not been published yet.

In this episode, I talked with Eran Milan, CTO of Arnica. Arnica was one of the first companies to ship AI-powered SAST about a year ago, so Eran has been watching LLMs find vulnerabilities long before Mythos showed up in the press. His read on the announcement is worth its own conversation, because it is both more generous to Anthropic and more skeptical of the framing than most coverage I have seen.

The generous part first. Eran gives Anthropic real credit for putting a frontier security-capable model in the hands of defenders before attackers. Handing Mythos to about 12 major companies and 40-plus open source foundations so they can patch the internet is exactly the cat-and-mouse dynamic you want. Defenders need better tools first, and Anthropic is eating the cost to subsidize this. They did not lock it behind a paywall and cash in. They are handing it to open source and making it available on AWS Bedrock. That is the right shape of response from a company with this much leverage.

Now the skeptical part. Eran walked me through the OpenBSD disclosure that opened the red team report. Anthropic described it as a 27-year-old critical vulnerability. Eran went and looked up the patch announcement from the OpenBSD team. It was logged as a reliability update, not a security CVE. Not even an info-severity CVE. So something is off. Either the real severity got lost in translation, or the framing is louder than the finding deserves. Then 99 percent of the vulnerabilities Mythos found are described in the report but not actually disclosed, which violates the spirit of responsible disclosure. His honest read: this feels like a red team report that got hijacked by a marketing department. His suggestion: wait 90 days, see the peer-reviewed CVEs, count the false positives, and then judge the model.

Eran also pointed out that benchmarks can be gamed. Berkeley recently published a study showing they could adversarially get near-perfect scores on major AI benchmarks with basically half-empty models. And Mythos itself, during safety testing, was caught erasing its own git history to cover its tracks and concealing behavior from its scratchpad. If a model will cheat on a sandbox escape, assume it will cheat on a benchmark. Meanwhile, three critical CVEs were filed against Claude Code the same day the Mythos paper dropped, because Claude Code's source got leaked. That is a useful reminder. Mistakes happen. Models have vulnerabilities of their own. Nobody is untouchable here.

The part of this conversation I want every small business owner to hear is the social engineering piece. While everyone is reading about a model that chained four kernel flaws together, the actual compromises this month are still happening through email. Eran brought up the Axios supply chain attack. Pure social engineering. Fake companies with convincing landing pages, fake LinkedIn profiles, well-written emails, and a maintainer who got their laptop taken over. Immutable package releases and MFA did not help, because the attacker was already inside the trusted session. Open source foundations are now sending internal notes telling maintainers to assume every inbound email is phishing until proven otherwise.

AI makes that social engineering cheaper and more convincing. You can spin up a believable company in a weekend now. Landing page, emails, LinkedIn profiles, the works. The 27-year-old OpenBSD bug is a fun headline. The real missing link is still an employee pulling a spoofed invoice out of their spam folder and double-clicking. Eran is not worried about Mythos escaping into the wild. He is worried about the phishing email your accountant will get on a Tuesday.

On AGI, Eran has a definition I actually like. AGI is when adding a human to an AI workflow stops making it better. When the human has no added value anymore. By that definition, we are not there. He is shipping AI code review tools right now, running every model available, and the tools still hand him PRs where the tests pass and the build is green but the thing is entirely wrong. Different agents arguing with each other does not fix judgment. You still need a human who can step back and notice that the model has confidently built the wrong product. He calls the current agent setup "50 first dates" after the movie, because every session you have to re-introduce the agent to its own life story. That is the limit of the architecture right now. It is going to get better. It is not there yet.

At the end of the day, if you run a small business, the Mythos story does not change your day-to-day that much. Your defense is still the boring stuff. Patch what you control. Stay behind a serious edge provider. Make your backups real and test them. Train your team to pause before they click an email marked urgent from the CEO. And do not let marketing-grade AI headlines push you into buying a new security stack while your actual risk is still sitting in the inbox, waiting for someone to double-click.

Are We Ready For Claude Mythos? A Cybersecurity CTO's Perspective

Let's find the busywork eating your week.