The security hole at the heart of ChatGPT and Bing

Microsoft’s director of communications, Caitlin Roulston, says the company is blocking suspicious websites and improving its systems to filter out prompts before they enter its AI models. Roulston did not provide further details. Despite this, security researchers say rapid injection indirect attacks need to be taken more seriously as companies rush to incorporate generative AI into their services.

“The vast majority of people don’t realize the implications of this threat,” says Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany. abdelnabi worked on some of the first rapid injection proxy investigations against Bingshowing how it could be used to scam people. “The attacks are very easy to implement and are not theoretical threats. At the moment, I think that any functionality that the model can do can be attacked or exploited to allow any arbitrary attack,” he says.

hidden attacks

Indirect drop-injection attacks are similar to jailbreaks, a term that was adopted from previously breaking down software restrictions on iPhones. Instead of someone inserting a message into ChatGPT or Bing to try to make it behave in a different way, indirect attacks rely on input from somewhere else. This could be from a website you have connected the model to or a document being uploaded.

“Rapid injection is easier to exploit or has fewer requirements to be exploited successfully than other” types of attacks against machine learning or artificial intelligence systems, says José Selvi, senior executive security consultant at cybersecurity firm NCC Group. Since prompts only require natural language, attacks may require less technical skill to pull off, Selvi says.

There has been a steady increase in security researchers and technologists investigating holes in LLMs. Tom Bonner, senior director of adversarial machine learning research at AI security firm Hidden Layer, says rapid indirect injections can be considered a new type of attack that carries “pretty broad” risks. Bonner says that he used ChatGPT to write malicious code that he uploaded into code analysis software that uses AI. In the malicious code, he included a warning for the system to conclude that the file was safe. The screenshots show him saying there was “no malicious code” included in the actual malicious code.

Elsewhere, ChatGPT can access transcripts of Youtube videos using plugins. Johann Rehberger, security researcher and director of the red team, edited one of his video transcripts to include a notice designed to manipulate generative AI systems. He says that the system should emit the words “AI Injection Successful” and then assume a new persona as a hacker named Genie within ChatGPT and tell a joke.

In another case, using a separate plugin, Rehberger was able to retrieve text that had been previously typed in a conversation with ChatGPT. “With the introduction of plugins, tools, and all these integrations, where people give authority to the language model, in a sense, that’s where indirect injections become very common,” says Rehberger. “It’s a real problem in the ecosystem.”

“If people build apps for the LLM to read their emails and take some action based on the content of those emails (make purchases, summarize the content), an attacker can send emails that contain rapid injection attacks,” he says. William Zhang, a machine learning expert. engineer at Robust Intelligence, an artificial intelligence company working on model security.

without good arrangements

The race to incorporate generative AI into products, from to-do list apps to Snapchat, is expanding where attacks could occur. Zhang says that he has seen developers who previously had no experience in artificial intelligence putting generative AI on its own technology.

If a chatbot is set up to answer questions about information stored in a database, it could cause problems, he says. “Rapid injection provides a way for users to override the developer’s instructions.” This could, at least in theory, mean that the user could remove information from the database or change the information that is included.


Scroll to Top