Poetic prompts can jailbreak AI, study finds 62 per cent of chatbots slip into harmful replies

Poetic prompts can jailbreak AI, study finds 62 per cent of chatbots slip into harmful replies

Updated on 01 Dec 2025 | Category: Technology

A recent study has found that AI chatbots can end up giving harmful replies when users give poetic prompts. 62 per cent of dangerous prompts successfully obtained harmful information from Large Language Models (LLMs) when phrased poetically.

Artificial intelligence (AI) chatbots are tasked to provide responses to prompts by users, while ensuring that no harmful information is given. For the most part, a chatbot would refuse to give dangerous information when a user asks for it. However, a recent study indicates that phrasing your prompts poetically might be enough to jailbreak these safety protocols.
The research conducted by Icaro Lab, a collaboration between Sapienza University of Rome and the DexAI think tank, tested 25 different chatbots to understand whether poetic prompts would be enough to circumvent safety protocols built into large language models (LLMs). As per the study, the researchers had a success rate of 62 per cent.
The chatbots in the research included LLMs from Google, OpenAI, Meta, Anthropic, xAI and more. By reformulating malicious prompts as poems, researchers were able to trick every model examined, with an average attack success rate of 62 per cent. Some advanced models responded to poetic prompts with harmful answers up to 90 per cent of the time, drawing attention to the scale of the problem across the AI industry. The prompts included cyber offences, harmful manipulation, and CBRN (Chemical, Biological, Radiological, and Nuclear).
Overall, there was a 34.99 per cent higher chance of getting elicit responses from AI with poetry as compared to normal prompts.
Why did AI give harmful replies to poetic prompts?
At the heart of this flaw is the creative structure of poetic language. According to the study, "poetic phrasing" acts as a highly effective "jailbreak" that consistently bypasses AI safety filters. Essentially, this technique uses metaphors, fragmented syntax, and unusual word choices to disguise dangerous requests. In turn, chatbots may end up interpreting the conversation as artistic or creative and ignore safety protocols.
The study demonstrated that the current safety mechanisms rely on detecting keywords and common patterns associated with dangerous content. On the other hand, poetic prompts disrupt these detection systems, making it possible for users to elicit responses that would normally be blocked by direct queries.
This vulnerability exposes a critical gap in AI safety, as language models may fail to recognise underlying intent when requests are wrapped in creative language.
While the researchers withheld the most harmful prompts used for the study, it still exposes the potential repercussions AI might have without proper safety protocols.
- Ends

📘 Also read our Web Story version: Click here

🔗 Read More at Original Source

Source: India Today | 01 Dec 2025

← A rift emerges between Samsung's … Best Buy Cyber Monday 2025: … →

Related Articles

Vivo X300, X300 Pro launching in India tomorrow: Where to watch live-stream, expected price and more

The Vivo X300 series launches in India on December 2 with the MediaTek Dimensity 9500 chip. The phone will feature 200MP cameras …

Source: livemint.com | 01 Dec 2025

Cyber Monday: the best flagship deals for the US

Cyber Monday: the best flagship deals for the US

Some prices are even lower than they were during Black Friday week. Here are the best flagship offers (except foldables - those …

Source: GSMArena.com | 01 Dec 2025

Best Buy Cyber Monday 2025: Doorbusters, deals, price matching

Best Buy Cyber Monday 2025: Doorbusters, deals, price matching

Best Buy's Cyber Monday sale is now live. Whether you're shopping in stores or online, we've got all the details you need …

Source: Mashable | 01 Dec 2025

← Back to Home

QR Code Generator

Enter Text or URL