”Trust, but verify”: Safety and Security in Critical System
The first and most important step in solving any problem is understanding the problem well enough to create effective solutions. [14]
So, what is the problem that we can learn from Figure 1?
Once up on a Flight from Jakarta
On 29 October 2018, Lion Air 610 Boeing 737-8 (MAX) aircraft took off from Jakarta bound for Pangkal Pinang. It seemed as a normal flight, but 13 minutes into the flight the aircraft disappeared from radar after informing Air Traffic Controller (ATCo) that they had flight control, altitude and airspeed issues. The aircraft nosed into the water in Tanjung Karawang, West Java, killing 189 passengers and crew aboard and destroying the aircraft [7].
An investigation was carried out and the report from Indonesian National Transportation Safety Committee (KNKT) was published in 2019 [7]. The report listed nine contributing factors of the accident. One of the contributing factors is that the Maneuvering Characteristics Augmentation System (MCAS) was designed to rely on a single Angle-of-Attack (AOA) sensor, making it vulnerable to erroneous input from that sensor. So, what is MCAS ? MCAS is a piece of software that would move a powerful control surface at the tail to push the aircraft’s nose down. MCAS is activated by AOA sensor aligns with oncoming airflow. Then the data is sent to the flight computer such that MCAS will lift the aircraft’s tail while moving the nose down [11]. Another contributing factor is the assumption about flight crew response to malfunctions based on current industry guidelines namely three second response. This assumption may not always be correct, which in the case of Lion Air 610, where at the second MCAS activation, the Captain responded with electric trim after eight seconds [7].
This accident is not the only software-related aircraft accident. More software-related spacecraft accidents were studied in [14]. Leveson presented common systemic factors such as ”Inadequate Human Factors Design for Software” which occurred in our case of Boeing 737-8 (MAX) pertaining to the three second response assumption which is inappropriate. Another factor is ”Poor or Missing Specifications” in Boeing 737-8 (MAX) design pertaining to relying on a single AOA sensor. This also shows an unsafe requirement and not implementation error as discussed by Leveson as one of issues in applying formal verification for critical system [13].
This accident proves that software flaws have effects for human safety. But what if the accident occurs because of external force such as an attack ? Is the software itself is safe ?
Once up on an Outage in Ukraine
On 23 December 2015, Ukraine power grid got hit by a cyberattack and an outage occurred for around three hours effecting around 80,000 customers. This incident was considered to be the first known successful cyberattack on a power grid. The investigation lead to BlackEnergy version 3 malware which infected the Human-Machine-Interface (HMI) workstations in control system plant networks. The techniques utilized by BlackEnergy version 3, for example are Spear Phishing, keylogging, VPN access, firmware modification of communication devices, and disconnecting Uninterrupted Power Supply Systems [3, 1].
This incident is not the only software-related critical infrastructure accident. Another example is in 2017, Triton/TRISIS/Hatman which effected Triconex safety controller which is a Schneider Electrics’ safety instrumented system (SIS). The malware was inserting firmware to change the logic of the final control element using TriStation communication protocol. Even though the attack failed; the consequences could have been disastrous oil and gas plant in the Middle East [8, 12, 21].
Safety-by-Design
From the stories we learn that safety in systems controlled by software is influenced both by internal (vulnerabilities/flaws) and external (threats/attacks) factors. Therefore, it is important to design safety into the system as it is being developed or re-engineered [15].
One of principles which is applicable is Principle of Least Privilege. Needham on [18] described the advantage of protecting system with fine-grained privileges policies that evolve through the system stages. This principle was later formulated by Saltzer and Schroeder on [19] which is known as Principle of Least Privilege. So, what is this principle about? In everyday life, we tend to give a thing enough privilege to carry out its function, that is not too many that the privilege maybe abused and not too few that the function is hindered. In Computer Science the principle is known as least privilege. The least privilege principle requires that every program is given the least set of privileges necessary to complete its task [19]. This principle depends on two components namely a program Prog and its usage SpecificationProg . The policy that grants the minimum privileges of program Prog to satisfy SpecificationProg can be defined as Priviliges(Prog,SpecificationProg ). An example of the least privilege is shown in the following Example.
Example: Prog is a spelling checker program and the SpecificationProg is that the misspelled words be flagged then Priviliges(Prog,SpecificationProg ) are read access to the open file F , read access to dictionary, and read/write access to jargon file (taken from [20]).
If this spelling checker program is given more privilege, in our case, for example the spelling checker is also allowed to write to an open file F . Then this over-entitlement may cause a breach such as erasing the open file.
Nevertheless, when the usage of the program is not only that the misspelled words be flagged, but also the misspelled words be auto-corrected, then the privileges with read/write access to the open file F is the least one.
However, if the program with the usage that not only that the misspelled words be flagged, but also the misspelled words be auto-corrected is given less privilege, in our case, for example, the spelling checker is not allowed to write to an open file F . Then this under-entitlement may cause the program not being able to perform its functionality correctly.
Another example from non-critical systems which is applicable is Security-by-contract. Massacci et al. have proposed security-by-contract[9, 16, 5, 17, 4]. A contract is a claim by a mobile application on the interaction with relevant security and privacy features of a mobile platform. This contract should be published by applications, understood by devices and all stakeholders (users,mobile operators, developers, platform developers, etc.). The contract should be negotiated, and enforced during development, at time of delivery and loading, and during execution of the application by the mobile platform. This framework can also be adapted to critical systems.
Trust, but verify
”Trust, but verify” is Ronald Reagan’s famous dictum which is very relevant in software for critical systems because we cannot only trust on software but we need to proof it is safe and secure. Leveson in [13] presented four misconceptions pertaining to software for critical systems. One of the misconceptions is that safety of software can be shown by testing, simulation, and formal verification. Leveson quoted Dijkstra that testing can show only the presence of errors, not their absence. Leveson also argued that simulation can show only that we have handled the things we thought of, not the ones we did not think about, assumed were impossible, or unintentionally left out of the simulation environment. Furthermore, Leveson argued that formal verification can show only the consistency of two formal models.
Revisiting our aircraft accident and power grid outage stories we learn that safety in systems controlled by software depends not only on the safety of each of the component but also on the integration of the whole systems. For example, the logic of the control element for the MCAS itself is safe but because the sensor was faulty thus the aircraft was nosed down. Another example is the logic of the control element for the UPS itself is safe but because the communication devices firmware were modified thus the UPS was disconnected and causing outage.
We can conclude that systems thinking[6] is very important to safety and security. Thus in software engineering a software architecture such as DevOps which integrates development and operation by incorporating testing and verification [2, 10] may help in bringing a safer and more secure software system.
In the end, we may say that today’s testing, simulation, and formal verification of software are not yet ideal, but considering the current situation, it is good enough for us to keep our critical systems going safely and securely.
References
Credit for picture: Elias Maurer (https://unsplash.com/photos/fcdQBASK8QM)
[1] D. Atch. Blackenergy 3 – exfiltration of data in ics networks. https://paper.seebug.org/papers/APT/APT_CyberCriminal_Campagin/2015/2015.05.27.BlackEnergy3/BlackEnergy-CyberX-Report_27_May_2015_FINAL.pdf, May 2015.
[2] Len Bass, Ingo Weber, and Liming Zhu. DevOps: A software architect’s perspective. Addison-Wesley Professional, 2015.
[3] C. Beek and R. Samani. A Case of Mistaken Identity? The Role of BlackEnergy in Ukrainian Power Grid Disruption. https://www.mcafee.com/blogs/other-blogs/mcafee-labs/blackenergy_ukrainian_power_grid/?hilite=\%27orkin\%27, 2016.
[4] N. Bielova, F. Massacci, and I. Siahaan. Testing decision procedures for security-by-contract. In Joint Workshop on Found. of Comp. Sec., Automated Reasoning for Sec. Protocol Analysis and Issues in the Theory of Sec. (FCS-ARSPA-WITS’08), 2008.
[5] N. Bielova, M. Dalla Torre, N. Dragoni, and I. Siahaan. Matching policies with security claims of mobile applications. In Proc. of the 3rd Intl. Conf. on Availability, Reliability and Security (ARES’08). IEEE Press, 2008.
[6] Peter Checkland. Systems thinking, systems practice, 1976.
[7] National Transportation Safety Committee. Aircraft accident investigation report. pt. lion airlines boeing 737 (max); pk-lqp tanjung karawang, west java, republic of indonesia 29 october 2018, October 2019. KNKT.18.10.35.04.
[8] A. Di Pinto, Y. Dragoni, and A. Carcano. TRITON: The First ICS Cyber Attack on Safety Instrument Systems, Understanding the Malware, Its Communications and Its OT Payload. https://i.blackhat.com/us-18/Wed-August-8/us-18-Carcano-TRITON-How-It-Disrupted-afety-Systems-And-Changed-The-Threat-Landscape-pdf, 2018.
[9] N. Dragoni, F. Massacci, K. Naliuka, and I. Siahaan. Security-by-Contract: Toward a Semantics for Digital Signatures on Mobile Code. In Proc. of the 4th European PKI Workshop Theory and Practice (EUROPKI’07), page 297. Springer-Verlag, 2007.
[10] Christof Ebert, Gorka Gallardo, Josune Hernantes, and Nicolas Serrano. Devops. Ieee Software, 33(3):94–100, 2016.
[11] Dominic Gates and Mike Baker. The inside story of mcas: How boeing’s 737 max system gained power and lost safeguards, June 2019.
[12] B. Johnson, D. Caban, M. Krotofil, D. Scali, N. Brubaker, and C. Glyer. Attackers Deploy New ICS Attack Framework “TRITON” and Cause Operational Disruption to Critical Infrastructure. https://www.fireeye.com/blog/threat-research/2017/12/attackers-deploy-new-ics-ttack-framework-triton.html, 2017.
[13] Nancy Leveson. Are you sure your software will not kill anyone? Communications of the ACM, 63(2):25–28, 2020.
[14] Nancy G Leveson. Role of software in spacecraft accidents. Journal of spacecraft and Rockets, 41(4):564–575, 2004.
[15] Nancy G Leveson. Engineering a safer world: Systems thinking applied to safety. The MIT Press, 2016.
[16] F. Massacci and I. Siahaan. Matching midlet’s security claims with a platform security policy using automata modulo theory. In Proc. of the 12th Nordic Workshop on Secure IT Systems (NordSec’07), 2007.
[17] F. Massacci and I. Siahaan. Simulating midlet’s security claims with automata modulo theory. In Proc. of the 2008 workshop on Prog. Lang. and analysis for security, pages 1–9, 2008.
[18] R.M. Needham. Protection systems and protection implementations. In Proc. of the 1972 Fall Joint Comp. Conf. AFIPS pt. 1, pages 571–578. ACM, 1972.
[19] J.H. Saltzer and M.D. Schroeder. The protection of information in computer systems. Proc. of the IEEE, 63(9):1278–1308, 1975.
[20] F.B. Schneider. Least privilege and more. Proc. of Symp. on Sec. and Privacy, 1(5):55–59, 2003.
[21] Antiy PTA Team. Technical Analysis of Industrial Control Malware TRISIS. https://www.antiy.net/p/antiy-released-technical-analysis-of-industrial-control-malware-trisis/, 2019.