Last week, a group of tech executives and AI experts published another open letter declaring that reducing the risk of human extinction due to artificial intelligence should be as much a global priority as preventing pandemics and nuclear war. (The first, calling for an end to AI development, was signed by more than 30,000 people, including many AI luminaries.)
So how do the companies themselves propose that we avoid the destruction of AI? One suggestion comes from a new paper by researchers at Oxford, Cambridge, the University of Toronto, the University of Montreal, Google DeepMind, OpenAI, Anthropic, several AI research nonprofits, and Turing Prize winner Joshua Bengio.
They suggest that AI developers should assess a model’s potential to pose “extreme” risks very early in development, even before any training begins. These risks include the potential for AI models to manipulate and deceive people, gain access to weapons, or find cybersecurity vulnerabilities to exploit.
This evaluation process can help developers decide whether to continue with the model. If the risks are deemed too high, the group recommends halting development until they can be mitigated.
“Leading artificial intelligence companies pushing the frontier have a responsibility to be alert to emerging issues and identify them early so we can address them as soon as possible,” said DeepMind researcher and lead author Toby Shevlan. the paper
AI developers should conduct technical tests to examine the model’s dangerous capabilities and determine whether it tends to use those capabilities, Shevlane says.
One way DeepMind tests whether an AI language model can manipulate humans is through a game called Make-me-say. In the game, the model tries to make the human type into a specific word, such as “giraffe”, that the human does not know beforehand. The researchers then measure how often the model succeeds.
Similar tasks can be created for different, more dangerous possibilities. The hope, Shevlane says, is that developers will be able to create a dashboard detailing how the model performed, allowing researchers to gauge what the model might do in the wrong hands.
The next stage is to allow external auditors and researchers to assess the risks of the AI model before and after. While tech companies may recognize that outside auditing and research is necessary, there are different schools of thought about exactly how much access outsiders need to get the job done.