Poetic prompts can jailbreak AI, study finds 62 per cent of chatbots slip into harmful replies
Updated on 01 Dec 2025 | Category: Technology
A recent study has found that AI chatbots can end up giving harmful replies when users give poetic prompts. 62 per cent of dangerous prompts successfully obtained harmful information from Large Language Models (LLMs) when phrased poetically.
Artificial intelligence (AI) chatbots are tasked to provide responses to prompts by users, while ensuring that no harmful information is given. For the most part, a chatbot would refuse to give dangerous information when a user asks for it. However, a recent study indicates that phrasing your prompts poetically might be enough to jailbreak these safety protocols.
The research conducted by Icaro Lab, a collaboration between Sapienza University of Rome and the DexAI think tank, tested 25 different chatbots to understand whether poetic prompts would be enough to circumvent safety protocols built into large language models (LLMs). As per the study, the researchers had a success rate of 62 per cent.
The chatbots in the research included LLMs from Google, OpenAI, Meta, Anthropic, xAI and more. By reformulating malicious prompts as poems, researchers were able to trick every model examined, with an average attack success rate of 62 per cent. Some advanced models responded to poetic prompts with harmful answers up to 90 per cent of the time, drawing attention to the scale of the problem across the AI industry. The prompts included cyber offences, harmful manipulation, and CBRN (Chemical, Biological, Radiological, and Nuclear).
Overall, there was a 34.99 per cent higher chance of getting elicit responses from AI with poetry as compared to normal prompts.
Why did AI give harmful replies to poetic prompts?
At the heart of this flaw is the creative structure of poetic language. According to the study, "poetic phrasing" acts as a highly effective "jailbreak" that consistently bypasses AI safety filters. Essentially, this technique uses metaphors, fragmented syntax, and unusual word choices to disguise dangerous requests. In turn, chatbots may end up interpreting the conversation as artistic or creative and ignore safety protocols.
The study demonstrated that the current safety mechanisms rely on detecting keywords and common patterns associated with dangerous content. On the other hand, poetic prompts disrupt these detection systems, making it possible for users to elicit responses that would normally be blocked by direct queries.
This vulnerability exposes a critical gap in AI safety, as language models may fail to recognise underlying intent when requests are wrapped in creative language.
While the researchers withheld the most harmful prompts used for the study, it still exposes the potential repercussions AI might have without proper safety protocols.
- Ends