The application of advances in Machine Learning and in particular Deep Learning (DL) to complex RF applications provides a great opportunity to advance DoD system capabilities. However, the black box nature of DL models requires additional advances in Explainable AI (XAI) to ensure that the output of the black box DL models can be justified and understood by warfighters. In particular, the application of XAI along with automated Test and Evaluation (T&E) is needed to validate both performance and reliability of DoD RF systems, particularly SIGINT and ELINT collection sensors. While recent investments including DARPAs Radio Frequency Machine Learning Software (RFMLS) and the Air Force ISR Modernization and Automation Development (IMAD) program increase the utility of fielded systems, the transition target Programs of Record require a stringent and rigorous level of validation and reliability prior to fielding a system. These lengthy validation and verification activities require even more rigorous testing to ensure the automated functions are operating as intended, with no adverse consequences. Thus, new XAI methods for DoD-specific applications are required that statistically: (1) quantify performance in an operationally relevant environment: and (2) quantify reliability (and thus availability). However, the historically extensive and expensive field testing associated with gathering all of the relevant statistics is unsustainable. To overcome these issues and provide the required statistical analysis of Radio Frequency (RF) systems, the Machina Cognita Technologies (MCT) and Epsilon team propose to develop the Statistical Characterization and Operation Readiness assessment of ELINT/SIGINT Deep learning (SCORED) system. The SCORED system will provide a Modular, Open Systems Approach (MOSA) to T&E of RF systems. SCORED will be a combination of the Test Automation Framework (TAF), an automated testing and scenario execution framework, and an AI/DL powered RF analysis and recommendation engine. The system will be able to statistically quantify performance across a range of operational conditions and quantify the reliability of the RF System Under Test (SUT). The output of the SCORED system will be a clear, concise explanation of the capabilities of the SUT including both text-based summarizations and visual representations of the statistical analysis. Users will then be able to further drill into these summarizations to explore the underlying statistical and raw data generated by the SCORED system during the T&E process.