From 120 Hours to 10 Minutes: What AI Really Changes in OT Security

A frontier AI model recently worked through a real industrial control system investigation that would have taken one of our best analysts roughly 120 hours. It produced a solid first pass in about ten minutes.

That story came out of our latest OT Office Hours, the monthly session where our team sits down with people who actually run and defend industrial environments. This episode put our CEO, Dan Gunter, in conversation with Rob Bair, Head of National Security at Anthropic, and they walked through what happened when a frontier model met live data from a real site. When Rob asked how long Dan’s best analyst would have needed, Dan’s answer was telling. His top people come out of the national labs, and even for them, between the tooling and the reasoning involved, the job took about 120 hours.

The ten-minute number is the part that grabs people. It is not the part that matters. That speed only counts because of everything we wrap around it, and everything we deliberately refuse to let the model do on its own. So let’s actually unpack it: what’s real, what’s hype, and where AI changes the day-to-day for OT defenders.

First, Why OT Is Its Own Planet

If you’re coming from IT, this is the mental shift that trips people up. In enterprise IT, you’re usually optimizing for confidentiality. Keep the data secret, keep the systems patched, and a little downtime is an annoyance, not a catastrophe.

OT flips that. Operational Technology is the gear that runs physical processes: the controllers, sensors, and SCADA systems behind power plants, water treatment, pipelines, refineries, and factory floors. The priority is always personnel safety. Right behind it is the physical process itself. That refinery might be moving benzene. You don’t reboot a turbine because it seemed like a good idea, and you don’t patch a controller on a random Tuesday afternoon to see what happens.

That reality changes how any AI model has to behave. A recommendation that’s perfectly sane on a corporate network can be genuinely dangerous on a plant floor. So even at the level of “what should we do next,” a model needs help understanding what normal looks like and what the cost of being wrong actually is. As Rob put it from the frontier lab side, the model has no eyes, no ears, and no hands inside the plant. It doesn’t know your plant manager. It is a reasoning engine, not an operator, and treating it like anything more is where people get hurt.

The Real Problem Isn’t Speed. It’s the Talent Math.

Here’s the part that doesn’t get said enough. The OT security shortage isn’t only a hiring gap; it’s a coverage gap.

Picture an asset owner running six or seven major control vendors across one environment. Your analyst might have deep visibility into one or two of them. Nobody is genuinely an expert across all of it, and pretending otherwise is how things slip through.

Now, stack on the operational reality. We work with asset owners who have thousands of distributed assets and a security team of two or three people. Some of those sites are a helicopter ride away. You are never going to cover that footprint by hiring your way out of it.

This is where AI earns its place, and it’s worth being precise about how. It does not manufacture OT expertise out of thin air. What it does is scale the expertise that already exists. Rob framed it as taking one analyst and turning them into a hundred. We’d call it augmentation math, not replacement math. The model handles the synthesis and the grunt work, reading the volume, correlating events, and drafting the first conclusions. Then the expert spends an hour judging that work and deciding what’s real. The first 40 to 120 hours of an engagement, the part that buries analysts, is exactly what we can lift off the plate, so the human goes deeper instead of drowning.

Where AI Genuinely Surprises Us

Protocols are the fun part. Once a frontier model actually sees a protocol on the wire, it tends to handle it surprisingly well.

We had a packet capture from a site where the model’s first read was basically “this all looks like TCP.” A small nudge later, no, that’s historian traffic, and it went right back through and started pulling historian tags on its own. That unknown-protocol space opens up fast with a little context.

It gets stranger. We’ve watched a model start reverse engineering a vendor’s proprietary protocol, hit a wall, tell us exactly which manual would unblock it, and then generate a working dissector once it had the spec. Protocol reverse engineering normally runs for weeks to months. When the program traffic is actually on the wire, an upload, a download, or an online edit, we’ve had it pull ladder logic rungs and put them in context. In one mine environment, it surfaced systems that the operators themselves didn’t know were there.

The thing that makes asset owners lean forward is the synthesis. Getting from a raw Modbus register all the way to what that register actually does in the physical process, tying cybersecurity data to operational meaning, is genuinely hard work for a human and exactly the kind of correlation a capable model is good at.

Where It Bites, and Why We Lose No Sleep Saying So

A hallucination in an OT network is not the same as a model imagining you had a meeting yesterday. It’s a safety risk.

If a confident, wrong answer nudges someone toward the wrong action in a plant, something can go offline, or worse. Is one percent wrong acceptable in enterprise IT? Often, yes, because a human review usually catches it before it matters. In a lot of OT settings, one percent wrong is a hard no.

The failure mode to watch for is overconfidence under uncertainty. A model will hand you a clean, authoritative-looking answer about something it has no real basis for. Here’s a favorite example. Drop powerhouse data from a mine into a model cold and ask for a tabletop exercise, and it might see SEL gear, DNP3, and outstations and confidently decide it’s looking at a power substation. Without checks, it then runs the entire engagement as if it’s an open power grid, and everything downstream inherits that wrong call. Overloaded ports cause the same trouble. Look at TCP 102, where S7comm, MMS, and others all live, and a model can pick the wrong protocol. One bad assumption near the top poisons every conclusion built on it.

The Discipline Is the Product

So, how do you get the ten minutes without inheriting the mistakes? Process. Boring, deliberate process.

Our OT assessment approach runs as an 11-to-15 gate workflow for a reason. You complete the asset inventory, and only then do you let the model infer from that inventory. You finish the crown jewels analysis before kicking off the next stage. If inference starts before analysis is done, that’s exactly where hallucinations compound and where you end up with an unexplainable black box.

Underneath that, we keep a chain of custody on how the model reaches its conclusions. Is this something it observed, something it inferred, or something it assumed? Observations we can point back to a specific packet get the highest confidence. Inference backed by a P&ID or a network map, a human has confirmed, gets the next tier. Anything where the model wants to say “this is going to blow up” gets flagged straight to someone with a PE after their name, because that is not a call an LLM should be making on its own.

Dan’s old nuclear-world phrase still covers it: trust but verify. Rob’s version, learned at a frontier lab, is that skepticism about the wrong architecture is completely valid. Either way, the job of the OT analyst is shifting toward judgment and supervision. If you’re mid-career and wondering what to learn next, it isn’t prompting tricks. It’s how to spot the confident-but-wrong answer before it ever reaches the plant.

Data Handling and the Air Gap Myth

Two questions come up in every single conversation with asset owners. Are you training on my data, and where does my data go?

On training, the answer is no on both sides. Anthropic doesn’t train on customer data, and our default posture is not to share. Everything else gets negotiated engagement by engagement, with data handling layered to whatever the customer is comfortable with, NERC CIP-regulated data included. For sensitive sites, the data stays in the enclave, and we run sanitization before anything moves.

Architecture matters more than promises here, and this is where the product design does the heavy lifting. In Valkyrie, our OT monitoring software, the model reaches data through an MCP binding rather than getting handed raw packets. By default, it only sees things like IP addresses, ports, traffic patterns, and threat hits. If you want deeper insight and you’re comfortable with it, you can opt into a fuller agentic mode where it reasons over the actual bits and bytes. Information sharing and full autonomy pull against each other, and the right answer lies in the middle, with the human staying accountable.

Then there’s the air gap. We’ll say it plainly: the air gap is mostly a myth. Vendor maintenance links get sold as one-way, and then you pull a packet capture and find out they’re very much two-way. That said, some environments truly are offline, and some have to stay that way. Classified facilities, forward-deployed military, offshore platforms, genuinely remote sites. A cloud-only model is architecturally locked out of all of those. That’s the gap Cygnet fills. It’s our flyaway kit that runs the same monitoring capability with no cloud and no internet, so we can bring the analysis to sites a cloud model will never reach.

Why Now?

The honest answer to “why now” is that three things showed up at the same time.

The capability got real. The regulation arrived. NERC CIP-015-1, approved in 2025, now requires Internal Network Security Monitoring, or INSM, meaning visibility into the east-west traffic moving inside a utility’s trust zones rather than just at the perimeter. And the attacker’s side got the same uplift the defenders did. The capability that helps a defender understand an obscure protocol also lowers the bar for someone targeting it. Pulling off a TRITON-style attack, the 2017 TRISIS malware that went after Triconex safety controllers used to require a nation-state lab. That bar is dropping toward opportunistic and ransomware groups. None of that is a reason to sit on our hands. It’s the reason defenders need to adopt this responsibly now, with a human in the loop, instead of waiting to learn it the hard way.

Frequently asked questions

Does AI replace OT security analysts? No. The model scales analysts; it doesn’t replace them. It handles the synthesis and the volume, so a human expert can spend their time on judgment and verification. The expertise still lives with the person.

Is it safe to use AI in OT and ICS environments? It can be, if you wrap it in a process. The risk isn’t the model thinking, it’s the model acting on a confident-but-wrong assumption. A gated workflow, a chain of custody on every conclusion, and a human signing off on anything safety-relevant are what make it safe.

Does Anthropic train on customer OT data? No. Anthropic doesn’t train on customer data, and our default posture is not to share it. Data handling is set per engagement, including for NERC CIP-regulated environments, and sensitive data stays in the enclave with sanitization before anything moves.

Can AI-driven OT monitoring work in air-gapped or remote sites? Yes. Cygnet, our flyaway kit, runs the same monitoring capability as Valkyrie with no cloud and no internet connection, which makes it suited to offline, classified, offshore, or otherwise unreachable sites.

What’s the difference between Valkyrie and Cygnet? Valkyrie is our OT security monitoring software. Cygnet is the flyaway version of it, built to run without cloud connectivity for portable, on-site, and air-gapped assessments.

Want the full conversation?

Check out the complete episode with Rob and Dan on YouTube

Share:

Interested in building your OT Cyber Foundations? Take our free course here.

More Posts

The Air Gap Myth in a Post-Volt Typhoon World

OT Threat Hunting Isn’t About Finding Bad Guys. It’s About Knowing What “Normal” Looks Like.

The Battery on Your Grid Is a Computer. Are You Treating It Like One?

Your Instincts Are Right. The Scoreboard Is Wrong.

PrevPreviousOT Threat Hunting Isn’t About Finding Bad Guys. It’s About Knowing What “Normal” Looks Like.

NextThe Air Gap Myth in a Post-Volt Typhoon WorldNext

Products

Services

Company

Resources