Review of best practices in the development of safe automotive software
Add bookmarkOn the 14th of February 2016 U.S. vehicle safety regulators announced that the artificial intelligence system controlling a self-driving Google car would be considered the driver under federal law. According to National Highway Traffic Safety Administration Chief Counsel, Paul Hemmersbaugh, NHTSA will interpret ‘driver’ in the context of Google’s described motor vehicle design as referring to the self-driving system, and not to any of the vehicle occupants.
This is obviously a quantum leap forward for autonomous vehicles, but one which needs to be approached with caution, especially by software developers. The risks involved in complex software solutions was well demonstrated by an Oklahoma court judgment in 2013 where the court ruled against Toyota in a case of unintended acceleration that lead to at least one fatal accident.
During the hearing, primary expert witness for the plaintiffs, Michael Barr, CTO and co-founder of Barr Group, an embedded systems consulting firm, cautioned all involved in software design and development, especially those involved with safety-critical software, against the dangers of not adhering to the standards and practices for design, coding, and testing; Calling the Toyota debacle a “shameful example of software design and development”
Barr revealed to the court a list of errors committed by the software team:
- Mirroring (where key data is written to redundant variables) was not always done
- Stack overflow: Toyota claimed only 41% of the allocated stack space was being used. Barr's investigation showed that 94% was closer to the truth. On top of that, stack-killing, MISRA-C rule-violating recursion was found in the code, and the CPU didn't incorporate memory protection to guard against stack overflow
- Two key items were not mirrored: The RTOS' critical internal data structures; and the final result of all this firmware, the TargetThrottleAngle global variable
- Although Toyota had performed a stack analysis, Barr concluded the automaker had completely botched it. Toyota missed some of the calls made via pointer, missed stack usage by library and assembly functions (about 350 in total), and missed RTOS use during task switching. They also failed to perform run-time stack monitoring.
- Toyota's ETCS used a version of OSEK, which is an automotive standard RTOS API. For some reason, though, the CPU vendor-supplied version was not certified as compliant.
But how do software developers cope with the ever increasing lines of code required to run the modern motor car? Modern passenger cars may rely on 100 million lines of code, five times that used in an F35 fighter jet. Embedded software now takes up 13% of the overall development time for a new vehicle, with ECU diagnostics accounting for 40% of the code and processor runtime, and 25% of overall test costs.
Software developers caught in a vice
Currently the emphasis in software development is around faster delivery and increased functionality. To achieve this it is important that sound engineering practices around the software development lifecycle are followed. Furthermore achieving functional safety in software requires that exacting engineering principles be implemented:
- Functional safety must be proactive
- Processes must be controlled, measured, and repeatable
- Defects should be prevented through the implementation of standards
- Testing must be effective and deterministic
- Testing should be done for complex memory problems
ISO 26262, MISRA, and other standards seek to normalize software development for automotive applications by providing a foundation for implementing accepted engineering concepts in software development processes. Whilst some companies may view compliance with ISO 26262 and other standards as an additional overhead burden, the truth is that the cost of failure associated with software defects is much, much greater than the cost of ensuring quality.
Achieving compliance with ISO 26262 and MISRA starts with a commitment to best practices in the form of policy. This policy is not a guideline, which suggests or recommends behavior; the policy is an automatically enforceable statement prescribing in plain language how software should be developed – and why it should be developed in this way. The policy must explicitly state that:
- Software must be developed according to the software development lifecycle (SDLC) as defined by ISO 26262
- Code will not be accepted from downstream subcontractors that don’t provide adequate traceability proving compliance with the standard
As an operational guide to software development best practices, ISO 26262 is based on years of research by some of the most experienced people in the field. The beauty of ISO 26262 is that, although it is intended for safety-critical functions, it can also be applied, in principle, to any software development.
The family of MISRA standards for C and C++ guides software engineers through best practices in writing code.
Functionally speaking, this means applying development testing activities, such as static code analysis, unit testing, peer review, and runtime error detection to ensure that code is being developed according to these policies. There are several tools on the market that implement MISRA guidelines in the form of analysis rules.
Nevertheless, as the amount of code increases, so do the demands of safety regulators, whilst the time to market reduces. In meeting these challenges software developers have to ensure that the ever increasing amount of code can be verified in a complex electrical, electronic and electromechanical context, to tightening standards, in the same time frame – or even reduce the time currently taken.
Time consuming fault testing is often done by taking a physical system and introducing real-world faults into it, measuring what happens and then writing a fault report and passing it to a product life-cycle management, or requirements tracking system. This is expensive, slow, and prone to errors in terms of the repeatability.
In order to overcome these challenges many developers are turning to simulation strategies which can accelerate software testing and integration, and reduce development times.
In a simulation-based approach, the system’s response to faults generated in a virtual model of the physical system (created in a tool such as Saber) is measured - without having to set up a testbench or wait for a system to be produced.
Once the simulation has been run, the tool automatically generates a report as well as simulation models and a simulation testbench, which can be evaluated by the product life-cycle management and requirements tracking systems.
Fault analysis and verification of software systems
Defects can cost automakers money and tarnish a brand’s reputation. NHTSA estimates that recalls and fixes across the industry cost automakers $3 billion annually. When it comes to the cost of software-related issues, a 2005 estimate from IEEE put the cost to manufacturers at $350 per car.
System fault analysis and verification requires the evaluation of systemic software failure, systemic hardware failure, and failures in the interaction between hardware and software. During this process it is important to evaluate a wide range of potential failure modes, including:
- The impact of a false signal in the hardware if it is propagated into the software
- The affect of an error in the signal-processing algorithm on the hardware
To correctly evaluate this, requires that all the levels of the design are analysed at once in the same environment.
To this end part four of ISO 26262 provides a reference model for system level design, prescribing that developers carry out both a deductive ‘failure tree analysis’ (FTA) and an inductive ‘fault mode and effects’ (FMEA) analysis for systems specified to ASIL levels C and D.
FTA is a top-down methodology in which undesirable behaviors are defined at the top level and then the possible cause(s) outlined. Probabilities are then assigned to each of these types of failure, and combinatory logic used to assess the likelihood and impact of different combinations of failures. Using ‘Hardware-in-the-Loop’ techniques, the assumptions made in the analysis are evaluated by inserting the different faults into the fault tree to see what happens.
FMEA, on the other hand, is a bottom-up methodology in which possible failures are identified and the effect and propagation of the failure mode determined.
Tools such as Synopsys’ Saber RD Fault solution enable users to automate fault simulation and results reporting to support failure analysis.
Effectively updating complex software systems OTA
In order to keep intricate software systems running error-free, regular updates and maintenance of the software is necessary. Usually these updates are carried out at the car dealership, which is time consuming and inconveniences the vehicle owner.
For the carmaker, in-person software updates are costly, especially in case of software-related recalls.
To alleviate this pressure in the rapidly expanding world of the connected car, over-the-air updates are rapidly becoming a mandatory feature, whether for critical bug fixes, scheduled updates, or new service offerings.
German software technology company, ATS Advanced Telematic Systems, recently revealed a newly developed, open-standards-based software solution “OTA Plus” at the Mobile World Congress in Barcelona.
ATS and Acer BYOC jointly develop cloud-based solutions for connected mobility, and in Barcelona ATS used its concept car to demonstrate the OTA Plus software suite for over-the-air updates focused on the automotive industry.
According to Armin G. Schmidt, CEO of ATS: “Securely managing and deploying critical software updates or new software features to a large number of connected vehicles requires a robust and highly scalable software over the air update solution. ATS tackles this ever-growing need with its state-of-the-art OTA Plus platform, which not only reduces maintenance costs and increases customer satisfaction, but also makes it possible to securely offer new features to connected vehicles already in the field,”
OTA Plus is based on the open source solution GENIVI SOTA, and commissioned by GENIVI, a non-profit industry alliance of leading automotive OEMs and suppliers. The alliance is committed to driving the broad adoption of open standards in in-vehicle infotainment software.
Software is everywhere and will continue to play a greater role as our once simple products become “smarter.” This is especially true in automotive development, which presents a unique challenge in terms of ensuring the safety, security, and reliability of embedded applications. Automotive blends safety-critical software with business-differentiating software, all of which is developed in a highly distributed manner.
The bottom line is that end-to-end testing for automotive applications is too expensive and too complex. On the other hand, the cost of software failure should serve as motivation for finding a way to mitigate risks. By applying automotive software development standards, such as ISO 26262 and MISRA, automakers put themselves in the best position to avoid the risks associated with faulty software.