AI is not a magic sauce
November 07, 2020
2 min read
I wrote this as part of an assignment for my Tech Law and Society module for my Master’s at the University of Cambridge. I’ll probably flesh this out later.
TL;DR We should be treating AI like any other software engineering component in our stack: with sceptism not blind faith.
The latest machine learning algorithm seems to model your problem perfectly… until it doesn’t. What then? Perhaps for polling prediction the outcome doesn’t matter, but what if the stakes are higher? What if AI is used in healthcare, or for self-driving cars, where an error could be fatal? Deploying AI poses both socio-technical and security risks which we must handle.
AI used in academia is detached from practical engineering in one key aspect - the data. AI in the real world does not operate on a large, carefully-curated dataset. Real world data is messy and contains biases. In 2016, Twitter users fed Microsoft’s AI chatbot Tay racist, misogynistic tweets and, lo and behold, Tay made racist and misogynistic remarks. Facial recognition datasets under-represented minorities such as African American faces, leading to racial bias in the model. Twitter recently fell foul of this with its intelligent thumbnail selection algorithm.
Using AI as a service? The dataset that the AI model was pretrained might not match your use case. The startup Nabla used OpenAI’s GPT-3 model to power a healthcare chatbot’s predictions. GPT-3’s training set was a general internet language corpus that lacked the domain-specific knowledge or appropriate ethical standards needed for a healthcare application. The result? GPT-3 recommended that a patient commit suicide. GPT-3 also provided responses that were syntactically correct and sounded impressive, but were factually incorrect.
We need to stop treating AI as the magic sauce, and instead treat it as just another component in our engineering stack. Just as MongoDB is not the solution to all database problems, neural networks are not the fix for all prediction problems. Despite being one of the leading AI research labs, DeepMind preferred existing formulae used by medical professionals over machine learning solutions for its Streams app. They chose instead to focus on the underlying issues with delivery of patient care.
We already battle-test our engineering stack with extensive testing and static analysis to find security vulnerabilities. These security best practices should carry over to our deployment of AI, where the attack surface not only encompasses code, but also the data in the system. Sandbox the model and scan external data for biases as we would scan processes for malware. Use access control mechanisms to ensure models aren’t trained on sensitive data. Incorporate domain-specific error handling around AI model predictions, just as we handle exceptions in code.
AI is not a panacea. It won’t magically fix the pollster’s predictions. AI might detect cancer better than humans… but it might make mistakes. We’ve been in this situation before, with the software revolution. We’ve developed safety-critical systems before. The AI revolution is coming, but we need to look past the hype and treat it with the same sceptism we treat other software.