Thursday, November 21, 2013

Stuxnet's Secret Twin

The real program to sabotage Iran's nuclear facilities was far more sophisticated than anyone realized.


BY RALPH LANGNER | NOVEMBER 19, 2013



Three years after it was discovered, Stuxnet, the first publicly disclosed cyberweapon, continues to baffle military strategists, computer security experts, political decision-makers, and the general public. A comfortable narrative has formed around the weapon: how it attacked the Iranian nuclear facility at Natanz, how it was designed to be undiscoverable, how it escaped from Natanz against its creators' wishes. Major elements of that story are either incorrect or incomplete.

That's because Stuxnet is not really one weapon, but two. The vast majority of the attention has been paid to Stuxnet's smaller and simpler attack routine -- the one that changes the speeds of the rotors in a centrifuge, which is used to enrich uranium. But the second and "forgotten" routine is about an order of magnitude more complex and stealthy. It qualifies as a nightmare for those who understand industrial control system security. And strangely, this more sophisticated attack came first. The simpler, more familiar routine followed only years later -- and was discovered in comparatively short order.
With Iran's nuclear program back at the center of world debate, it's helpful to understand with more clarity the attempts to digitally sabotage that program. Stuxnet's actual impact on the Iranian nuclear program is unclear, if only for the fact that no information is available on how many controllers were actually infected. Nevertheless, forensic analysis can tell us what the attackers intended to achieve, and how. I've spent the last three years conducting that analysis -- not just of the computer code, but of the physical characteristics of the plant environment that was attacked and of the process that this nuclear plant operates. What I've found is that the full picture, which includes the first and lesser-known Stuxnet variant, invites a re-evaluation of the attack. It turns out that it was far more dangerous than the cyberweapon that is now lodged in the public's imagination.
***
In 2007, an unidentified person submitted a sample of code to the computer security siteVirusTotal. It later turned out to be the first variant of Stuxnet -- at least, the first one that we're aware of. But that was only realized five years later, with the knowledge of the second Stuxnet variant. Without that later and much simpler version, the original Stuxnet might still today sleep in the archives of anti-virus researchers, unidentified as one of the most aggressive cyberweapons in history. Today we now know that the code contained a payload for severely interfering with the system designed to protect the centrifuges at the Natanz uranium-enrichment plant.
Stuxnet's later, and better-known, attack tried to cause centrifuge rotors to spin too fast and at speeds that would cause them to break. The "original" payload used a different tactic. It attempted to overpressurize Natanz's centrifuges by sabotaging the system meant to keep the cascades of centrifuges safe. "Protection systems" are used anywhere where abnormal process conditions can result in equipment damage or threaten the health of operators and the environment. At Natanz, we see a unique protection system in place to enable sustained uranium enrichment using obsolete and unreliable equipment: the IR-1 centrifuge. This protection system is a critical component of the Iranian nuclear program; without it, the IR-1s would be pretty much useless.
The IR-1 centrifuge is the backbone of Iran's uranium-enrichment effort. It goes back to a European design from the late 1960s and early 1970s that was stolen and slightly improved by Pakistani nuclear trafficker A.Q. Khan. The IR-1 is an all-metal design that can work reliably. That is, if parts are manufactured with precision and critical components such as high-quality frequency converters and constant torque drives are available. But the Iranians never managed to get a high degree of reliability from the obsolete design. So they had  to lower the operating pressure of the centrifuges at Natanz. Lower operating pressure means less mechanical stress on the delicate centrifuge rotors, thereby reducing the numbers of centrifuges that have to be put offline because of rotor damage. But less pressure means less throughput -- and thus less efficiency. At best, the IR-1 was half as efficient as its ultimate predecessor.
As unreliable and inefficient as the IR-1 is, it offered a significant benefit: Iran managed to produce the antiquated design at industrial scale. Iran compensated reliability and efficiency with volume, accepting a constant breakup of centrifuges during operation because they could be manufactured faster than they crashed. But to make it all work, the Iranians needed a bit of a hack. Ordinarily, the operation of fragile centrifuges is a sensitive industrial process that doesn't tolerate even minor equipment hiccups. Iran built a cascade protection system that allows the enrichment process to keep going, even when centrifuges are breaking left and right.
At the centrifuge level, the cascade protection system uses sets of three shut-off valves, installed for every centrifuge. By closing the valves, centrifuges that run into trouble -- indicated by vibration -- can be isolated from the rest of the system. Isolated centrifuges are then run down and can be replaced by maintenance engineers while the process keeps running.
Then-President Mahmoud Ahmadinejad looks at SCADA screens in the control room at Natanz in 2008. The screen facing the photographer shows that two centrifuges are isolated, indicating a defect, but that doesn’t prevent the respective cascade from continuing operation.
But the isolation valves can turn into as much of a problem as a solution. When operating basically unreliable centrifuges, one will see shut-offs frequently, and maintenance workers may not have a chance to replace damaged centrifuges before the next one in the same enrichment stage gets isolated. Once multiple centrifuges are shut off within the same stage, operating pressure -- the most sensitive parameter in uranium enrichment using centrifuges -- will increase, which can and will lead to all kinds of problems.
The Iranians found a creative solution for this problem -- basically another workaround on top of the first workaround. For every enrichment stage, an exhaust valve is installed that allows pressure to be relieved if too many centrifuges within that stage get isolated, causing pressure to increase. For every enrichment stage, pressure is monitored by a sensor. If the pressure exceeds a certain threshold, the exhaust valve is opened, and overpressure is released.
The system might have keep Natanz's centrifuges spinning, but it also opened them up to a cyberattack that is so far-out, it leads one to wonder whether its creators might have been on drugs.
Natanz's cascade protection system relies on Siemens S7-417 industrial controllers to operate the valves and pressure sensors of up to six cascades, or groups of 164 centrifuges each. A controller can be thought of as a small embedded computer system that is directly connected to physical equipment, such as valves. Stuxnet was designed to infect these controllers and take complete control of them in a way that previous users had never imagined -- and that had never even been discussed at industrial control system conferences.
A controller infected with the first Stuxnet variant actually becomes decoupled from physical reality. Legitimate control logic only "sees" what Stuxnet wants it to see. Before the attack sequence executes (which is approximately once per month), the malicious code is kind enough to show operators in the control room the physical reality of the plant floor. But that changes during attack execution.
One of the first things this Stuxnet variant does is take steps to hide its tracks, using a trick straight out of Hollywood. Stuxnet records the cascade protection system's sensor values for a period of 21 seconds. Then it replays those 21 seconds in a constant loop during the execution of the attack. In the control room, all appears to be normal, both to human operators and any software-implemented alarm routines.
Then Stuxnet begins its malicious work. It closes the isolation valves for the first two and last two enrichment stages. That blocks the outflow of gas from each affected cascade and, in turn, raises the pressure on the rest of the centrifuges. Gas centrifuges for uranium enrichment are extremely sensitive to increases of pressure above near vacuum. An increase in pressure will result in more uranium hexafluoride getting into the centrifuge, putting higher mechanical stress on the rotor. Rotor wall pressure is a function of velocity (rotor speed) and operating pressure; more gas being pressed against the rotor wall means more mechanical force against the thin tube. Ultimately, pressure may cause the gaseous uranium hexafluoride to solidify, thereby fatally damaging centrifuges.
The attack continues until the attackers decide that enough is enough, based on monitoring centrifuge status. Most likely, they would use vibration sensors, which let them abort a mission before the matter hits the fan. If catastrophic destruction is intended, one simply has to sit and wait. But in the Natanz case, causing a solidification of process gas would have resulted in simultaneous destruction of hundreds of centrifuges per infected controller. While at first glance this might sound like a goal worthwhile achieving, it would also have blown the attackers' cover; the cause of the destruction would have been detected fairly easily by Iranian engineers in postmortem analysis. The implementation of the attack with its extremely close monitoring of pressures and centrifuge status suggests that the attackers instead took great care to avoid catastrophic damage. The intent of the overpressure attack was more likely to increase rotor stress, thereby causing rotors to break early -- but not necessarily during the attack run.
Nevertheless, the attackers faced the risk that the attack would not work at all because the attack code is so overengineered that even the slightest oversight or any configuration change would have resulted in zero impact or, worse, in a program crash that would have been detected by Iranian engineers quickly.
The results of the overpressure attack are unknown. Whatever they were, the attackers decided to try something different in 2009.
This new Stuxnet variant was almost entirely different from the old one. For one thing, it was much simpler and much less stealthy than its predecessor. It also attacked a completely different component of the Natanz facility: the centrifuge drive system that controls rotor speeds.
This new Stuxnet spread differently too. The malware's earlier version had to be physically installed on a victim machine, most likely a portable engineering system, or it had to be passed on a USB drive carrying an infected configuration file for Siemens controllers. In other words, it needed to be disseminated deliberately by an agent of the attackers.
The new version self-replicated, spreading within trusted networks and via USB drive to all sorts of computers, not just to those that had the Siemens configuration software for controllers installed. This suggests that the attackers had lost the capability to transport the malware to its destination by directly infecting the systems of authorized personnel, or that the centrifuge drive system was installed and configured by other parties to which direct access was not possible.
What's more, Stuxnet suddenly became equipped with an array of previously undiscovered weaknesses in Microsoft Windows software -- so-called "zero day" flaws that can fetch hundreds of thousands of dollars on the open market. The new Stuxnet also came equipped with stolen digital certificates, which allowed the malicious software to pose as legitimate driver software and thus not be rejected by newer versions of the Windows operating system.
All this indicates that a new organization began shaping Stuxnet -- one with a stash of valuable zero days and stolen certificates. In contrast, the development of the overpressure attack can be viewed as the work of an in-group of top-notch industrial control system security experts and coders who lived in an exotic ecosystem quite remote from standard IT security. The overspeed attacks point to the circle widening and acquiring a new center of gravity. If Stuxnet is American-built -- and, according to published reports, it most certainly is -- then there is only one logical location for this center of gravity: Fort Meade, Maryland, the home of the National Security Agency.
But the use of the multiple zero days came with a price. The new Stuxnet variant was much easier to identify as malicious software than its predecessor was, because it suddenly displayed very strange and very sophisticated behavior. In comparison, the initial version looked pretty much like a legitimate software project for Siemens industrial controllers used at Natanz; the only strange thing was that a copyright notice and license terms were missing. The newer version, equipped with a wealth of exploits that hackers can only dream about, signaled to even the least vigilant anti-virus researcher that this was something big, warranting a closer look.
Just like its predecessor, the new attack operated periodically, about once per month, but the trigger condition was much simpler. While in the overpressure attack various process parameters were monitored to check for conditions that might occur only once in a blue moon, the new attack was much more straightforward.
The new attack worked by changing rotor speeds. With rotor wall pressure being a function of process pressure and rotor speed, the easy road to trouble is to overspeed the rotors, thereby increasing rotor wall pressure. And this is what Stuxnet did. The normal operating speed of the IR-1 centrifuge is 63,000 revolutions per minute (rpm). Stuxnet increased that speed by a good one-third to 84,600 rpm for 15 minutes. The next consecutive run brought all centrifuges in the cascade basically to a stop (120 rpm), only to speed them up again, taking a total of 50 minutes. The IR-1 is a supercritical design, meaning that the rotor has to pass through so-called critical speeds before reaching normal operating speed. Every time a rotor passes through these critical speeds, also called harmonics, it can break.
If a single rotor did crack during an attack sequence, the cascade protection system would kick in to isolate and run down the respective centrifuge. But if multiple rotors were to crash -- a likely possible outcome -- Iranian operators would be left with the question of why all of a sudden so many centrifuges broke at once. Not that they didn't have enough new ones in stock for replacement, but unexplained problems like this are among any control system engineer's most frustrating experiences, usually referred to as chasing a demon in the machine.
At some point the attacks should have been recognizable by plant floor staff just by the old eardrum. Bringing 164 centrifuges or multiples thereof from 63,000 rpm to 120 rpm and getting them up to speed again would have been noticeable -- if experienced staff had been cautious enough to remove protective headsets in the cascade hall. It's another sign that the makers of this second Stuxnet variant had decided to accept the risk that the attack would be detected by operators.
***
Much has been written about the failure of Stuxnet to destroy a substantial number of centrifuges or to significantly reduce Iran's enriched-uranium production. While that is undisputable, it doesn't appear that either was the attackers' intention. If catastrophic damage had been caused by Stuxnet, that would have been by accident rather than on purpose. The attackers were in a position where they could have broken the victim's neck, but they chose continuous periodical choking instead. Stuxnet is a low-yield weapon with the overall intention of reducing the lifetime of Iran's centrifuges and making the Iranians' fancy control systems appear beyond their understanding.
Reasons for such tactics are not difficult to identify. When Stuxnet was first deployed, Iran had already mastered the production of IR-1 centrifuges at industrial scale. During the summer of 2010, when the Stuxnet attack was in full swing, Iran operated about 4,000 centrifuges, but kept another 5,000 in stock, ready to be commissioned. A one-time destruction of the Iranians' operational equipment would not have jeopardized that strategy, just like the catastrophic destruction of 4,000 centrifuges by an earthquake back in 1981 did not stop Pakistan on its way to getting the bomb. By my estimates, Stuxnet set back the Iranian nuclear program by two years; a simultaneous catastrophic destruction of all operating centrifuges wouldn't have caused nearly as big a delay.
The low-yield approach also offered added value. It drove Iranian engineers crazy, up to the point where they might have ultimately ended up in total frustration about their capabilities to get a stolen plant design from the 1970s running and to get value from their overkill digital protection system. When comparing the Pakistani and Iranian uranium-enrichment programs, one cannot fail to notice a major performance difference. Pakistan basically managed to go from zero to successful low-enriched uranium production within just two years during shaky economic times, without the latest in digital control technology. The same effort took Iran over 10 years, despite the jump-start from Pakistan's A.Q. Khan network and abundant money from sales of crude oil. If Iran's engineers didn't look incompetent before, they certainly did during the time when Stuxnet was infiltrating their systems.
Legend has it that in the summer of 2010, while inflicting its damage on Natanz, Stuxnet "escaped" from the nuclear facility due to a software bug that came with a version update. While that is a good story, it cannot be true. Stuxnet propagated only between computers that were attached to the same local network or that exchanged files though USB drives. In other words, Stuxnet must have spread largely by human hands. But in these days of remote access by modem or via Internet virtual private networks, human hands can extend across continents.
Contractors serving at Natanz worked for other clients as well. And those contractors most likely carried their Stuxnet-infected laptop computers to their secondary clients and connected their laptops to the clients' "local" networks. Let's say they spread it to a cement plant. That cement plant then had other contractors, who in turn connected their mobile computers to the infected "local" network. Those computers carried the malware farther -- to another cement plant, maybe in another country. At some link in the chain, infected contractors or employees remotely accessed their machines, allowing the virus to travel over continents. All of a sudden, Stuxnet has made its way around the globe -- not because of the fact that billions of systems are connected to the Internet, but because of the trusted network connections that tunnel through the Internet these days. For example, remote maintenance access often includes the capability to access shared folders online, giving Stuxnet a chance to traverse through a secure digital tunnel. My colleagues and I saw exactly that when we helped Stuxnet-infected clients in industries completely unrelated to the nuclear field back in 2010.
Given that Stuxnet reported Internet protocol addresses and hostnames of infected systems back to its command-and-control servers, it appears that the attackers were clearly anticipating (and accepting) a spread to noncombatant systems and were quite eager to monitor that spread closely. This monitoring would eventually deliver information on contractors working at Natanz, their other clients, and maybe even clandestine nuclear facilities in Iran.
Stuxnet also provided a useful blueprint to future attackers by highlighting the royal road to infiltration of hard targets. Rather than trying to infiltrate directly by crawling through 15 firewalls, three data diodes, and an intrusion detection system, the attackers acted indirectly by infecting soft targets with legitimate access to ground zero: contractors. However seriously these contractors took their cybersecurity, it certainly was not on par with the protections at the Natanz fuel-enrichment facility. Getting the malware on the contractors' mobile devices and USB sticks proved good enough, as sooner or later they physically carried those on-site and connected them to Natanz's most critical systems, unchallenged by any guards.
Any follow-up attacker will explore this infiltration method when thinking about hitting hard targets. The sober reality is that at a global scale, pretty much every single industrial or military facility that uses industrial control systems at some scale is dependent on its network of contractors, many of which are very good at narrowly defined engineering tasks, but lousy at cybersecurity. While experts in industrial control system security had discussed the insider threat for many years, insiders who unwittingly helped deploy a cyberweapon had been completely off the radar. Until Stuxnet.
And while Stuxnet was clearly the work of a nation-state -- requiring vast resources and considerable intelligence -- future attacks on industrial control and other so-called "cyber-physical" systems may not be. Stuxnet was particularly costly because of the attackers' self-imposed constraints. Damage was to be disguised as reliability problems. I estimate that well over 50 percent of Stuxnet's development cost went into efforts to hide the attack, with the bulk of that cost dedicated to the overpressure attack which represents the ultimate in disguise - at the cost of having to build a fully-functional mockup IR-1 centrifuge cascade operating with real uranium hexafluoride. Stuxnet-inspired attackers will not necessarily place the same emphasis on disguise; they may want victims to know that they are under cyberattack and perhaps even want to publicly claim credit for it.
And unlike the Stuxnet attackers, these adversaries are also much more likely to go after civilian critical infrastructure. Not only are these systems more accessible, but they're standardized. Each system for running a power plant or a chemical factory is largely configured like the next. In fact, all modern plants operate with standard industrial control system architectures and products from just a handful of vendors per industry, using similar or even identical configurations. In other words, if you get control of one industrial control system, you can infiltrate dozens or even hundreds of the same breed more.
***
Looking at the two major versions of Stuxnet in context leaves a final clue -- a suggestion that during the operation, something big was going on behind the scenes. Operation Olympic Games -- the multiyear online espionage and sabotage campaign against the Iranian nuclear program -- obviously involved much more than developing and deploying a piece of malware, however sophisticated that malware was. It was a campaign rather than an attack, and it appears that the priorities of that campaign shifted significantly during its execution.
When my colleagues and I first analyzed both attacks in 2010, we first assumed that they were executed simultaneously, maybe with the idea to disable the cascade protection system during the rotor-speed attack. That turned out to be wrong; no coordination between the two attacks can be found in the code. Then we assumed that the attack against the centrifuge drive system was the simple and basic predecessor after which the big one was launched, the attack against the cascade protection system. The cascade protection system attack is a display of absolute cyberpower. It appeared logical to assume a development from simple to complex. Several years later, it turned out that the opposite was the case. Why would the attackers go back to basics?
The dramatic differences between both versions point to changing priorities that most likely were accompanied by a change in stakeholders. Technical analysis shows that the risk of discovery no longer was the attackers' primary concern when starting to experiment with new ways to mess up operations at Natanz. The shift of attention may have been fueled by a simple insight: Nuclear proliferators come and go, but cyberwarfare is here to stay. Operation Olympic Games started as an experiment with an unpredictable outcome. Along the road, one result became clear: Digital weapons work. And different from their analog counterparts, they don't put military forces in harm's way, they produce less collateral damage, they can be deployed stealthily, and they are dirt cheap. The contents of this Pandora's box have implications much beyond Iran; they have made analog warfare look low-tech, brutal, and so 20th century.
In other words, blowing the cover of this online sabotage campaign came with benefits. Uncovering Stuxnet was the end of the operation, but not necessarily the end of its utility. Unlike traditional Pentagon hardware, one cannot display USB drives at a military parade. The Stuxnet revelation showed the world what cyberweapons could do in the hands of a superpower. It also saved America from embarrassment. If another country -- maybe even an adversary -- had been first in demonstrating proficiency in the digital domain, it would have been nothing short of another Sputnik moment in U.S. history. So there were plenty of good reasons not to sacrifice mission success for fear of detection.
We're not sure whether Stuxnet was disclosed intentionally. As with so many human endeavors, it may simply have been an unintended side effect that turned out to be critical. One thing we do know: It changed global military strategy in the 21st century.
Ralph Langner began his research on Stuxnet in 2010. He is a principal with the Langner Group, a cyberdefense consultancy, and a non-resident fellow with the Brookings Institution.
A longer version of this report, "To Kill a Centrifuge: A Technical Analysis of What Stuxnet's Creators Tried to Achieve," can be found here.

No comments: