Testing Ideas For BS5750, ISO9000, Tickit and Defstan 00-55 |
Synopsis
Software engineering is probably is the newest of the established professions but whilst it is attempting to regulate the conduct and practices of its members via ISO9000/BS5750 and DefStan 00-55, the limitations of current mathematics and engineers' education prevents the latter being properly implemented. This is due to its emphasis on the use of formal methods.
DefStan 00-55 does define long term objectives for how software should be produced: within 00-55 standard there are currently realisable objectives such as in-target, dynamic testing via microprocessor emulation, which could be expanded to compensate for the:
(i) Inability to formally express a software system and,
(ii) inadequacies in conventional analysis tools when dealing with real time, interrupt-driven software.
It is suggested that the mandatory in-circuit testing phases outlined in 00-55, may grow to become the major guarantee of system quality, given the inadequacy of current formal methods and the general lack of expertise in this area. With current resources, the intelligent use of in-circuit dynamic testing may be the best short-medium term hope for improving software reliability.
Software Engineering - A New Profession
Amongst the professions such as law, medicine, architecture, mechanical engineering and others, software engineering is probably the newest, dating back really no more than about forty years. The professions pride themselves on producing high quality, high integrity products and services which customers or consumers are eminently satisfied with. The professions tend to be governed by charters, containing rules that regulate members working and ethical behaviour to ultimately ensure the quality of the product or service offered by the "professional" individual.
Unfortunately though, software engineering, being a newcomer, has not yet got to the stage where this has happened. In the case of accountancy, the conventions used have been drawn up over several hundred years, whereas in software engineering, standards are very recent. In some professions such as (probably) medicine and certainly accountancy, the approach to the individual's job is very much laid down, if not in statutes, then within by-laws of the professional society concerned. With software engineering, the best parallel within the established professions is probably accountancy. Here, the auditor's job is to examine the very large and complex system constituting a company, to determine exactly how it has functioned and whether a profit or loss has been made.
The approach taken is very much cast in concrete, as anyone who has ever met an accountant will testify: It is all laid down by the ACA and the rules are designed to allow large systems to be broken into manageable pieces, so that each part is small enough to be easily audited in its own right. When the whole thing is brought together on the balance sheet, the results can therefore be deemed reliable.
This is almost an exact parallel with software engineering. Here, the task is normally to take a complex system, which may indeed be a business, but more likely a complex control task consisting of many discrete software modules. After individual verification of correctness, the final linked system should function correctly and the original objective reached.
Quality Of Software Engineers' Work
The quality of software systems has always been something of an issue, albeit a low profile one. Recently though, it has come to the fore with the inclusion of more software within safety-critical controllers. The classic examples are fly-by-wire systems within commercial aircraft where software faults or bugs can, in theory, cause aircraft to crash, with great loss of life. To ensure that safety critical systems are indeed safe, the production of the software from Day 1 has got to be done "properly". Unfortunately to date, the definition of "properly" has been rather imprecise - software engineering having no governing body to lay down the procedures to be used in the formal way encountered by other professions.
Rules Of Professional Conduct
In the accountancy profession, a proper audit made by an accountant may be defined by reference to the rules laid down by the ACA. With software engineering of safety-critical systems, there has been no such absolute reference possible. It is this possibility of safety critical systems causing death and injury that really gives software engineers almost the same responsibility as doctors in the treatment of patients.
In recognition of this need to try and regularise the production methods and apply a uniform approach to software production across industry, two standards are coming into use: ISO9000, otherwise known as BS5750, and the Interim Defence Standard 00-55, released in July 1989.
ISO9000/BS5750
Both these standards have proved somewhat controversial and in the case of the former, over whether it was appropriate for software production at all. The ISO9000/BS5750 is essentially a framework for a total quality system which may be equally applied to any manufacturing or any design process, whether mechanical or software in nature. In reality, it probably is more appropriate to mechanical/electronic production, but its main tenet is the maintenance of records at each stage of process and the recording of results of specific tests used in the design/manufacturing process.
When applied to software engineering, this identifies system documentation as the crucial issue, whether it be fully commented listing files, or documented results of specific tests of software operation. Overall, its insistence that quality records are maintained and kept is really where ISO9000 is most relevant.
DefStan 00-55
DefStan 00-55 is a software-specific document and is rather more rigourous, in fact some people would say too rigourous, in its definition of how software quality should be acheived/ensured. Before publication, it was rumoured to recommend solely the use of formal mathematical methods as the way of dispensing with testing all together. In reality, 00-55 calls for a formal mathematical specification and a plain English description, plus "validation and verification" at each stage in the development process. Naturally, the first application of this is to check that the formal specification tallies with the English language description. No guidance is given by the document as to how this should be achieved! However, prototyping the specification in a declarative language such as PROLOG or LISP might be a possibility.
Formal Methods In 00-55
The biggest grey area in DefStan 00-55 is the exact form of the formal methods to be used to express software systems. This is extremely controversial because there are very few software systems, particularly in the embedded field, where the situation is entirely deterministic, i.e. predictable; Unfortunately if something is not predictable it is extremely hard to express it mathematically using formal methods. It is for this reason above all others that DefStan 00-55 has been widely criticised as being unrealistic. Current mathematical techniques are under-developed in this area and are certainly not within the experience of the average software practioner.
The Practicalities Of Software Testing
However, there are some aspects of DefStan 00-55 which are eminently reasonable and significantly, these are the areas which tend to overlapped with ISO9000. This is in the field of testing and the maintenance of records. Now while it is possible to mathematically describe sections of software in simple cases, in the embedded world it is very unusual that this is really possible. The mathematics required to describe certain systems, such as an idle speed controller on a car, for example, are really not sufficiently developed for them to be of any use to the practising engineer. Given that a piece of software cannot realistically be expressed mathematically, the alternative may be to concentrate on testing the system dynamically within its target environment (23.5), another phase suggested by the standard. DefStan 00-55 does actually make specific mention of testing of "software within the target system and in the target environment subjected to normal and abnormal inputs". Correctly or not, this is likely to be taken to mean that if something cannot be expressed rigourously in mathematics, the next best is to subject a given piece of software (or complete software system) to every possible input and rigourously check the results.
To be a valid test, input data should include all legal values/and or data combinations plus extend to (most importantly) the abnormal or unexpected. If the system can be proven to work, or to not have any hazardous effect, given any of these test inputs, then the it can be deemed safe. The printed or otherwise stored records from the test should then be added to the overall system quality records/documentation.
This approach might be viewed as simply second best but there may actually be a valid argument along the lines that if the test object is driven with every possible input value and then the results are verified to be correct/"safe", this may better than trying to formally express a system mathematically. In reality having used formal methods, you then have to prove that the maths used is correct and does meet the plain English specification. This unfortunately opens up a whole new set of arguments which cannot satisfactorily answered at the present time.
The approach where you simply present every possible input to a system is probably more realistic, and may at the end of the day yield higher integrity results. This method acknowledges that complete understanding of the behaviour of a software system is not possible in mathematical terms. Again parallels exist in other professions, where a doctor may pronounce a patient fit as a result of responses to specific medical tests, without knowing exactly how the body works.
With the current difficulties in expressing system in formal notations, this approach may have become very commonplace already.
Further Obstacles To The General Application Formal Methods -
Lack Of Determinism And Interrupts
The lack of determinism within embedded software systems is most easily highlighted by the case of interrupt routines driven by real world events which are, by definition, completely non-deterministic. For instance, it is not usually possible to determine between which two processor cycles or two processor addresses an interrupt will occur. Simple, serial code may indeed be formally specified and tested by the static tools such as SPADE and MALPAS but real world-driven, hard real time systems cannot.
Testing With Interrupts Enabled
Logically, the only way you can actually test the good performance or correct operation of a particular piece of software that has interrupts is to run the test object, whether it be a single function or and entire system, for an extended period, with all possible inputs supplied and in the target system at full speed. An alternative that may be employed is to use a static, mainframe-based simulation tool such as SPADE or MALPAS. Theoretically, these tools should be able to do the job, but the reality of it is that they are still unlikely to be able to simulate every possible input, and even if they could, the potential number of combinations of input value and interrupt timing would render the test so slow as to become unfeasible.
As an example, a system with 100 input variables - not entirely unreasonable, running at 40MHz with maybe five realtime interrupts, would have an almost infinite number of possible combinations of interrupt-interrupt and interrupt-background interactions.
A Practicable Solution In Typical Embedded Software Systems
A common approach in the case of industry-standard embedded microprocessors such as 8051, 80186, 68000 etc., is to use a software cpu simulator. Here, various inputs are simulated by user-written macros within the simulator, so that software modules can be subjected to all possible inputs. The subsequent response/outputs are then logged to a file, where some form of post-processor program picks up the input value(s), compares it with the results and makes pass/fail decision. Whilst this is certainly better than doing nothing, in the final reckoning, it is not real and doesn't run in real time. Of course, there is the possibility that the simulator does not accurately simulate the processor. The room for error is increased by the common situation where different generations of a single manufacturer's microprocessor include subtle changes within processor execution - maybe one extra or one less cycle per instruction or some other usually insignificant variation.
A Currently Realisable Solution
At present, the best answer may well be to use a proper in-circuit emulator and indeed within DefStan0055, in-circuit emulation methods are mentioned as suitable ways of testing software. There are two sides to this: firstly, a method has to be devised to permit the test, and secondly, the input values and the results need to be recorded and checked, either by a post-processor or recorded as just another part of the system documentation. This latter item may have to be kept for ten years or so after the design phase.
To illustrate how this might be achieved, consider a particular C function chosen for a test. When using an in-circuit emulator, the complete program would be set running and the target system subjected to its real inputs and each time the module/software function was entered, the parameter values passed (or whatever constitutes the "input" to this function), would be extracted by the emulator system, stored away into either the processor's memory or into a PC disk file. At the end of the function when all the processing has taken place, the result is extracted and stored away, again probably to a disk file. This ideally would be done in real time, with no interruption of execution during the data capture phases. Currently, some development systems such as our own T8 and T51 will allow values to be extracted non-intrusively.
The beauty of this approach is that the real time interrupts can be running as normal - all that's happening is that the function's input values and output values are logged. At some later time, manual or automatic checks can be made that the values were indeed correct. This will test the response of a particular software function to valid input values. This is assuming of course that the rest of the system is driving correct values into the tested object.
To take care of the abnormal input values situation, the function needs, to be isolated more completely from the rest of the system. Here, rather than just extracting the values input to the function, the emulator will, when it sees the function being called, insert pre-defined test data into the function. At the of the function, the emulator extracts the result and stores it. A similar set of manual or automatic checks then validate the output against both the inputs and the range of legal output values. Thus if the maximum input value is deemed to be 0F0H and the output value must be 80; when inputting FF, the checking program would simply verify that the output value stayed at 80. Obviously if the output value was some other number, a fault would be flagged.
These two tests would constitute maybe two parts of the system documentation. The first one, all input values and output results would be logged into a file constituting a dynamic test. In the second one, the forced values, with the abnormal ones highlighted, would also be logged into the system documentation along with the output values. Ideally, each of these files would have a date stamp on and probably some sort of test name for reference purposes.
In-Circuit Emulation Testing
The foregoing tests would hopefully prove that a certain piece of software performed correctly under both normal and abnormal operating conditions; because an in-circuit emulator has been used, the real silicon CPU will have been executing the real instructions with the real inputs active. Thus, if within the dynamic test an interrupt occurred and over-wrote a value within the software function being tested, the output result would be corrupted accordingly. This of course cannot be guaranteed, as it depends on what has been overwritten, but certainly this basic test is better than doing nothing at all.
With the appropriate trigger conditions set, it is conceivable that the emulator could be told to simply log writes to memory locations when it sees that the interrupt routine is currently being executed. This requires a sequence of triggers whereby you would have to log the results, given that:
- the software function being tested has been entered
- the PC is in an interrupt routine at the same
time
- a write occurs
(This is where the sequence definitions become very useful on the T51 and T8.)
It is then possible, if a legal court case results at a later date due to alleged software failure, for the software engineer who produced the code to prove that the software was tested to specification, with normal and abnormal inputs on this day, and was found to be correct. Such evidence of testing would considerable strengthen the engineer's or the company's case in any defence.
System-Wide Testing
The above are examples of dealing with specific sections of software system, maybe just one functional block or one software function. There is another aspect to testing which could usefully be taken to ensure that the system performed or ran in real time, in the way that the engineer expected and the specification demands. At a system level, particularly if there is more than one engineer on a project, there are some quality hazards which are not generally recognised but should be tested for. The extent to which a complete software system has been tested can be gauged by the % coverage obtained; i.e, the proportion of the whole code actually executed during testing. Commonly, incomplete coverage is symptomatic of either incomplete test data or software errors. Thus conveniently, low coverage indicates that there is a problem somewhere in the whole design and test process.
The fundamental rationale behind coverage as a quality metric is that provided during the test session the code was not seen to malfunction and given that all code was executed, then the software can be considered correct. Less than 100% coverage during testing leaves the possibility that under field conditions, the untested code might be activated and some latent error become apparent.
Finally, the verifiable absence of particular problems could usefully be included in system documentation to prove again the overall integrity of the system.
Referring back to 00-55, there is a stated need for coverage analysers to be employed but normally this will only be a simulation test phase, performed by a dedicated coverage analyser. Hence, the same arguments about whether an embedded real time system can be so tested arise again. However, there are now in-circuit tools available which can allow program coverage to be measured under real, dynamic conditions. Thus any unexecuted code remaining after a test can be identified and the reasons for its apparent redundancy examined. In an ideal system, given every possible combination of input conditions, 100% of the system code should be executed and indeed, anything less than 100% coverage must leave a question mark over the system's integrity or the thoroughness of the testing.
Examples of errors which are more easily revealed by dynamic methods might be:
(i) Overload of stack by interrupts occurring in critical or simply unexpected places; here given that a certain interrupt (or interrupts) are currently being serviced, if a further interrupt occurs there is a possibility that the stack will overflow with a possible system crash resulting.
(ii) Use of uninitialised data. This often occurs on group projects where one engineer defines a variable, for example a static with an undefined initial. Someone else writes a function which assumes this, but later on the original programmer changes it but doesn't tell the second programmer resulting in the software running with a variable which may have an uncertain value at entry time.
This hazard can be tested for statically, but there are instances where this could arise as a result of interrupts occurring - an interrupt running before the background program has had a chance to initialise the global variable used by the service routine.
Integrity Of The Test Tools Themselves
The obvious question here is "if the development system or the emulator is to be used to prove our software is working correctly, how do we know that the emulator itself works correctly?". There is of course no more certainty that the emulator will work than, for instance, Microsoft C will compile to give perfect integrity code. Generally a measuring instrument, here the emulator, should be several orders of magnitude more reliable than the object being tested, i.e. the software function. This is probably true if the procedures to be executed by the emulator's control software are of equal sophistication to those of the software function, but fortunately, this is normally not the case; the emulator is most likely to be involved only in a simple detection and retrieval operation, where data simply will be removed from the system and stored for later processing. No processing of the data will actually occur. It is chiefly for this reason that the emulator software need only be of equal reliability to the software being tested, in its final state. The fact that the emulator is doing basically such a simple task is likely to mean that the results are still valid.
In reality of course, the object under test is going to be many times more complex in its operation than any part of the emulation system used to perform the test. The objection therefore, to an essentially software based tool being used to prove other software is not really valid.
A secondary point might be that unlike the project being tested, the development system's software will be in use in many places and so the chances of an intrinsic error being discovered are very high. Thus the risk associated with the test tool's software is lowered.
Summary
The software engineering profession probably is the newest profession but whilst it is attempting to regulate the conduct and practices of its members via the two standards mentioned, the limitations of current mathematics and engineers' education prevents these being properly implemented.
DefStan 00-55 does define long term objectives for how software should be produced. Within 00-55 standard there are currently realisable objectives such as in-target, dynamic testing via emulation which could be expanded to compensate for an inability to formally express a software system. With current resources then, the intelligent use of in-circuit testing might be the best short-medium term hope for improving software reliability.
First Published In Microsystem Design