AI researchers trick chatbots into sharing how to make cocaine as long as they believe a user is wearing a green shirt — 'CoT Forgery' exploit spurs LLMs to divulge forbidden info by faking trusted chains of thought

Tagged partitions of a LLM's input sequence are meant to provide security through trusted roles, but it turns out that models judge whether inputs sound like they belong in certain tags rather than literally interpreting them, making them vulnerable to prompt injection.

Source: Tom's Hardware — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.

Source

Tom's Hardware · View original

#Artificial Intelligence#Tech Industry

Supported by VREXO™ Intelligence Systems.

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.