登入選單
返回Google圖書搜尋
註釋Abstract: "A research team led by SRI International has completed a 1.5 year period of work in adaptive, distributed, fault-resistant systems. This research has been motived by the increasingly complex and dynamic nature of working environments for modern systems, especially distributed, real-time systems. Operating conditions in such environments vary greatly in the types and distributions of faults and input data, in user requirements in changing service situations, and in possible losses of computing resources. The traditional approach -- applying given resources in a fixed system configuration to meet worst-case operating conditions -- is becoming less tenable. Adaptive systems can track changes in the environment by modifying the way computing resources are organized and utilized. The goal of the research was to establish a foundation for a general methodology of design for adaptive, distributed, real-time, fault- resistant systems. This report presents a general theory and architecture, a taxonomy of design approaches, and examples of concrete architecture and design techniques. A core approach is the use of a control-theory model for adaptive computer systems; key issues derived from the model are the need for accurate state evaluation and prediction and incremental control to assure adaptation stability. The study investigated general frameworks for specifying trade-offs among service attributes such as timeliness, accuracy and precision and examined how such trade-offs can be managed during adaptation. Several new issues and opportunities in fault-tolerant computing were uncovered, including the use of formal models for specifying and predicting adaptive fault-resistant systems, reflective architecture for recursive control of fault tolerance implementations, and multihypothesis fault diagnosis to reduce the ambiguity and diagnosis latency in real-time, distributed systems. Several case studies are presented, including Adaptive, Distributed Recovery Blocks (ADRBs), a scheme for exchanging processing resources for recovery speed, Adaptive Distributed-Thread Integrity (ADTI), a scheme for dynamically selecting appropriate detection and recovery protocols for managing node and link failures in the Alpha programming model, and Adaptive Fault Tolerance for Hybrid Faults (AFTHF), an efficient scheme for tolerating faults with a wide range of complexity."