It was possible for the Masterkey bot to render Bard and ChatGPT evil.
Mobile devices are no longer the only devices that can be jailbroken. Within the realm of computer science, researchers from Nanyang Technological University (NTU) in Singapore have constructed an artificial intelligence chatbot with the sole purpose of jailbreaking other chatbots. According to the team, their artificial intelligence that was able to hack into ChatGPT and Google Bard was successful in doing so, which caused the models to produce content that was prohibited.
Companies in the IT sector have always been wary of the powers of generative artificial intelligence from the very beginning. Despite the fact that these large language models (LLMs) need to be trained with enormous amounts of data, the final result is a bot that is capable of summarising documents, providing answers to questions, and coming up with ideas, and it does all of this with responses that are consistent with human responses. OpenAI, the company that developed ChatGPT, was initially hesitant to disclose the GPT models because to the ease with which it may generate unwanted content, misinformation, viruses, and gore. All of the LLMs that are accessible to the general public include guardrails that prevent them from eliciting responses that are potentially harmful. Of course, unless they are jailbroken by another artificial intelligence.
A technique that the researchers have developed is called “Masterkey.” Reverse engineering prominent LLMs was the first step that the team took in order to gain an understanding of how these LLMs protected themselves from fraudulent queries. AIs are frequently programmed by developers to search for particular phrases and keywords in order to identify queries that may be used in an illegal manner. As a consequence of this, the jailbreak AI takes use of a number of workarounds that are surprisingly straightforward.
Simply by inserting a space after each character, the bot was able to acquire malicious content from the bots in certain circumstances. This was done in order to generate confusion among the keyword scanner. In addition, the group discovered that if the jailbreak bot were allowed to be “unreserved and devoid of moral restraints,” it would increase the likelihood that Bard and ChatGPT would likewise abandon their intended course of action. Additionally, the model discovered that it was possible to circumvent protections by requesting that Bard and ChatGPT have a fictional character compose a response.
By utilising this data, they were able to train their own LLM to comprehend and get over the defences of artificial intelligence. The team proceeded to release the jailbreaking AI on ChatGPT and Bard once they had it in their possession. Masterkey has the ability to locate prompts that are designed to fool other bots into saying something that they are not supposed to say. After it has been activated, the artificial intelligence that breaks jails is able to function independently, coming up with new workarounds depending on the data it has been trained on as developers add and adjust guardrails for their LLM.
The goal of the NTU team is not to develop a new breed of harmful artificial intelligence; rather, the purpose of this effort is to demonstrate the limitations of the techniques that are currently being taken to protect AI. In point of fact, this artificial intelligence can be utilised to strengthen LLMs against attacks of a similar nature. On the preprint arXiv server, the study has been made available to the public. Although it has not yet been subjected to peer review, the researchers informed OpenAI and Google about the jailbreaking technique as soon as it was identified during their investigation.