How OpenAI is trying to make ChatGPT safer and less biased

It’s not just about pissing off journalists (some of whom really should know better than to anthropomorphize and promote a dumb chatbot’s ability to have feelings). “woke” bias.

All this anger is finally taking its toll. Bing’s trippy content is generated using an AI language technology called ChatGPT, developed by startup OpenAI, and last Friday OpenAI published a blog post aimed at clarifying how its chatbots should behave : It also published its guidelines on how ChatGPT should respond when prompted about the US’s “culture wars”. The rules include, for example, not being associated with political parties or judging one group as good or bad.

I spoke with Sandhini Agarwal and Lama Ahmad, two AI policy researchers at OpenAI, about how the company is making ChatGPT more secure and less nuts. The company declined to comment on its relationship with Microsoft, but they still had some interesting insights. This is what they said.

How to get better answers? One of the biggest open questions in AI language model research is how to stop models from “hallucinating,” which is a polite term for making things up. ChatGPT has been used by millions of people for months, but we haven’t seen the kind of fakes and hallucinations that Bing has caused.

That’s because OpenAI used a technique in ChatGPT called reinforcement learning from human feedback, which improves the model’s responses based on user feedback. The technique works by asking people to choose between different results before ranking them according to various criteria such as factuality and verisimilitude. Few experts think Microsoft may have skipped or rushed this stage to launch Bing, though the company has yet to confirm or deny that claim.

But that method isn’t perfect, according to Agarwal. People could be offered options that were all false, then choose the option that was the least likely, he says. In order to make ChatGPT more reliable, the company focused on cleaning up its database and removing instances where the model favored fakes.

Jailbreaking ChatGPT. Since the release of ChatGPT, people have been trying to “jailbreak” it, which means finding workarounds to get a model to break its own rules and create racist or conspiratorial material. This work has not gone unnoticed at OpenAI HQ. Agarwal says OpenAI went through its entire database and picked out the clues that led to unwanted content to refine the model and stop these generations from repeating.

OpenAI wants to hear. The company said it will start gathering more feedback from the public to shape its models. OpenAI researches using surveys or creates citizen gatherings to discuss which content should be banned outright, Lama Ahmad said. “In the context of art, for example, nudity might not be considered vulgar, but how do you think about it in the context of ChatGPT in the classroom,” she says.

Source link