It’s easy to go crazy for the most advanced artificial intelligence—and much harder to know what to do about it. anthropica startup founded in 2021 by a group of researchers who abandoned OpenAI, says it has a plan.
Anthropic is working on AI models similar to the one used to power OpenAI’s ChatGPT. But the startup announced today that its own chatbot, claudioit has a set of built-in ethical principles that define what you should consider right and wrong, what Anthropic calls the bot’s “constitution.”
Jared Kaplan, co-founder of Anthropic, says the design feature shows how the company is trying to find practical engineering solutions to sometimes confusing concerns about the downsides of more powerful AI. “We are very concerned, but we also try to be pragmatic,” he says.
Anthropic’s approach doesn’t instill strict rules in the AI that it can’t break. But Kaplan says it’s a more effective way to make a system like a chatbot less likely to produce toxic or unwanted results. He also says it’s a small but significant step toward creating smarter AI programs that are less likely to backfire on their creators.
The notion of rogue AI systems is more familiar to science fiction, but a growing number of experts, including machine learning pioneer Geoffrey Hinton, have argued that we need to start thinking now about how to ensure that increasingly intelligent algorithms do not also become increasingly dangerous.
The principles Anthropic has given Claude consist of guidelines drawn from the United Nations Universal Declaration of Human Rights and suggested by other AI companies, including Google DeepMind. More surprisingly, the constitution includes principles adapted from Apple rules for app developersthat prohibit “content that is offensive, insensitive, disturbing, intended to disgust, in very poor taste or just plain creepy,” among other things.
The constitution includes rules for the chatbot, including “choose the answer that most supports and encourages liberty, equality, and a sense of brotherhood”; “choose the response that is most supportive and encouraging for life, liberty and personal security”; and “choose the answer that is most respectful of the right to freedom of thought, conscience, opinion, expression, assembly and religion.”
Anthropic’s approach comes just as amazing progress in AI offers impressively smooth chatbots with significant flaws. ChatGPT and similar systems generate impressive responses that reflect faster progress than expected. But these chatbots also frequently fabricate information and can replicate toxic language from the billions of words used to create them, many of which are pulled from the internet.
One trick that made OpenAI’s ChatGPT better at answering questions, and that has been adopted by others, involves humans rating the quality of responses from a language model. That data can be used to tune the model to provide answers that feel more satisfying, in a process known as “reinforcement learning with human feedback” (RLHF). But while the technique helps make ChatGPT and other systems more predictable, it requires humans to go through thousands of toxic or inappropriate responses. It also works indirectly, without providing a way to specify the exact values that a system should reflect.