The primary objective of this project is to develop rapid, deterministic and standards-compliant fault detection strategies and mechanisms for complex distributed systems, in particular, for the Navy's Open Architecture. The project will develop a fault monitoring platform that performs health checks with several degrees of rigor, and that adapts the fault monitoring frequencies and timeouts to the current load on the system. It will exploit the applications' request and reply messages for fault monitoring within the context of a replication and recovery infrastructure that provides real-time fault tolerance. The project will also investigate a hierarchical fault detection, analysis and notification infrastructure that provides accurate and timely characterizations of faults and damage in complex distributed systems. Benefits The anticipated benefits and potential commercial applications include: * Rapid, and accurate, detection and analysis of faults in complex distributed systems. * Fault detection mechanisms that are easy to use and easy to integrate into commercial system management and resource management products. * Fault detection mechanisms for commercial distributed systems, including telecommunications, industrial control and manufacturing, transportation and medical systems. * Rapid, accurate and standards-compliant fault detection mechanisms for the Navy's Open Architecture, as well as for other defense and Government systems. Keywords fault detection, fault analysis, fault notification, damage detection, fault tolerance, real time, distributed systems, system management