Abstract
The objective of fault-tolerant computing systems is to provide an error-free operation in the presence of faults. The system has to recover from the effects of a fault by employing certain recovery procedures like program rollback, reload, and restart, etc. However, these recovery procedures, result in interruptions in the system's operation, thus reducing the availability of the system for user applications. Fault-tolerant systems for critical applications include, therefore, standby spares that are ready to replace active modules which fail to recover from the effects of a fault. A standby spare may also be used to replace a module suffering from frequent fault occurrences resulting in too many repetitions of the recovery process, in order to increase the availability of the system for user applications. In this case a module switching policy is needed indicating upon a fault occurrence, whether to retry a failing module or switch it out and replace it by a spare, considering the remaining mission time and the probability of a system crash. A module switching policy for dynamic redundancy systems is presented in this paper and the improvement in application-oriented availability due to the use of this policy is illustrated.
Original language | English |
---|---|
Pages (from-to) | 1052-1062 |
Number of pages | 11 |
Journal | IEEE Transactions on Computers |
Volume | C-36 |
Issue number | 9 |
DOIs | |
State | Published - Sep 1987 |
Externally published | Yes |
Keywords
- Application-oriented availability
- deterioration models
- failure rate
- fault tolerance
- modular redundancy
- module switching policy
- recovery
- standby spare
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics