Why should I care about AI Safety?
Will AI really cause a catastrophe? Hopefully not! AI has tremendous potential for making the world a better place, especially as the technology continues to develop. We’re already seeing some beneficial applications of AI to healthcare, accessibility, language translation, automotive safety, and art creation, to name just a few. However, advanced AI also poses some serious risks.
At the very least, malicious actors could use AI to cause harm, e.g. building dangerous weapons, spreading fake news, empowering oppressive regimes, and more.
More speculatively, advanced AI systems could potentially seek power or control over humans. It’s possible that future AI systems will be qualitatively different from those we see today. They may be able to form sophisticated plans to achieve their goals, and also understand the world well enough to strategically evaluate many relevant obstacles and opportunities. Furthermore, they may attempt to acquire resources or resist shutdown attempts, since these are useful strategies for some goals their designers might specify. To see why these failures might be challenging to prevent, see this research on specification gaming and goal misgeneralization from DeepMind.
It’s worth reflecting on the possibility that an AI system of this kind could outmaneuver humanity’s best efforts to stop it. Meta’s Cicero model demonstrated that AI systems can successfully negotiate with humans when it reached human-level performance in Diplomacy, a strategic board game, so an advanced AI system could manipulate humans to assist it or trust it. In addition, AI systems are swiftly becoming proficient at writing computer code with models like Codex. Combined with models like ACT-1, which can take actions on the internet, it seems that advanced AI systems could be formidable computer hackers. Hacking creates a variety of opportunities; e.g., an AI system might steal financial resources to purchase more computational power, enabling it to train longer or deploy copies of itself.
Maybe none of this will happen. Indeed, in a 2022 survey of AI experts, 25% of respondents gave a 0% (impossible) chance of AI causing catastrophes of a magnitude comparable to the death of all humans. But more alarmingly, 48% of respondents in the same survey assigned at least 10% probability to such an outcome.
Perhaps the biggest problem is the rapid pace of advances in AI research. If AI starts causing significant problems, the world might have only a short time to address them before things spiral out of control.
Introductory resources
The brief argument above skipped over a lot of other important considerations. For more details on how AI might possibly cause a catastrophe, check out these articles:
Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well (Max Roser)
Preventing an AI-related catastrophe (Benjamin Hilton) + audio version
AI experts are increasingly afraid of what they’re creating (Kelsey Piper)
Why I Think More NLP Researchers Should Engage with AI Safety Concerns (Samuel Bowman)
Why Would AI "Aim" To Defeat Humanity? (Holden Karnofsky) + audio version
The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
Benefits & Risks of Artificial Intelligence (Ariel Conn)
Alternatively, here are some related podcasts:
Richard Ngo or Paul Christiano on the AI X-risk Research Podcast
Brian Christian or Ben Garfinkel on the 80,000 Hours Podcast
Ajeya Cotra or Rohin Shah on the Future of Life Institute Podcast
Or videos:
Lastly, some great books include The Alignment Problem by Brian Christian, Human Compatible by Stuart Russell, and Superintelligence by Nick Bostrom.