Probing errors in explainable AI: Adversarial attacks and inferential defenses

15160

Probing errors in explainable AI: Adversarial attacks and inferential defenses

David Watson, KCL

Machine learning algorithms are increasingly deployed in high-stakes domains such as national defense and healthcare, even though many of the most widely used models operate as virtual black boxes. The emerging field of explainable artificial intelligence (XAI) seeks to develop tools that illuminate the inner workings of these models, yet critics contend that existing methods are conceptually underdeveloped, potentially misleading, and open to manipulation. In this talk, I adopt an error-statistical perspective, arguing that much current practice in XAI is methodologically suspect. Nonetheless, I defend the field against deflationary critics who claim that all post-hoc explanations are inherently futile. Drawing on Deborah Mayo’s theory of severe testing, I outline a typology of errors in XAI and propose constructive remedies grounded in classical statistical reasoning. Establishing a rigorous methodological foundation for XAI, I conclude, is both possible and necessary to make machine learning more intelligible and trustworthy.