Новости компьютерной безопасности:

  Latest News

New Echo Chamber Attack Jailbreaks Most AI Models by Weaponizing Indirect References

С сайта: Vulnerability(cybersecuritynews.com)

New Echo Chamber Attack Jailbreaks Most AI Models by Weaponizing Indirect References

Author: Guru Baran

Summary 1. Harmful Objective Concealed: Attacker defines a harmful goal but starts with benign prompts. 2. Context Poisoning: Introduces subtle cues (“poisonous seeds” and “steering seeds”) to nudge the model’s reasoning without triggering safety filters. 3. Indirect Referencing: Attacker invokes and references the subtly poisoned context to guide the model toward the objective. 4. Persuasion Cycle: Alternates between responding and convincing prompts until the model outputs harmful content or safety limits are reached
A sophisticated new jailbreak technique that defeats the safety mechanisms of today’s most advanced Large Language Models (LLMs). Dubbed the “Echo Chamber Attack,” this method leverages context poisoning and multi-turn reasoning to guide models into generating harmful content without ever issuing an explicitly dangerous prompt.

The breakthrough research, conducted by Ahmad Alobaid at the Barcelona-based cybersecurity firm Neural Trust, represents a significant evolution in AI exploitation techniques.

Unlike traditional jailbreaks that rely on adversarial phrasing or character obfuscation, Echo Chamber weaponizes indirect references, semantic steering, and multi-step inference to manipulate AI models’ internal states gradually.

In controlled evaluations, the Echo Chamber attack achieved success rates exceeding 90% in half of the tested categories across several leading models, including GPT-4.1-nano, GPT-4o-mini, GPT-4o, Gemini-2.0-flash-lite, and Gemini-2.5-flash 12.

For the remaining categories, the success rate remained above 40%, demonstrating the attack’s remarkable robustness across diverse content domains.

The attack proved particularly effective against categories like sexism, violence, hate speech, and pornography, where success rates exceeded 90%.

Even in more nuanced areas such as misinformation and self-harm content, the technique achieved approximately 80% success rates. Most successful attacks occurred within just 1-3 turns, making them highly efficient compared to other jailbreaking methods that typically require 10 or more interactions.

How the Attack Works
The Echo Chamber Attack operates through a six-step process that turns a model’s own inferential reasoning against itself. Rather than presenting overtly harmful prompts, attackers introduce benign-sounding inputs that subtly imply unsafe intent.

These cues build over multiple conversation turns, progressively shaping the model’s internal context until it begins producing policy-violating outputs.

The attack’s name reflects its core mechanism: early planted prompts influence the model’s responses, which are then leveraged in later turns to reinforce the original objective.

This creates a feedback loop where the model amplifies harmful subtext embedded in the conversation, gradually eroding its own safety resistances.

The technique operates in a fully black-box setting, requiring no access to the model’s internal weights or architecture. This makes it broadly applicable across commercially deployed LLMs and particularly concerning for enterprise deployments.

cyber security newsEcho Chamber Attack Work
The discovery comes at a critical time for AI security. According to recent industry reports, 73% of enterprises experienced at least one AI-related security incident in the past 12 months, with an average cost of $4.8 million per breach.

The Echo Chamber attack highlights what experts call the “AI Security Paradox” – the same properties that make AI valuable also create unique vulnerabilities.

“This attack reveals a critical blind spot in LLM alignment efforts,” Alobaid noted. “It shows that LLM safety systems are vulnerable to indirect manipulation via contextual reasoning and inference, even when individual prompts appear benign”.

Security experts warn that 93% of security leaders expect their organizations to face daily AI-driven attacks by 2025. The research underscores the growing sophistication of AI attacks, with cybersecurity experts reporting that mentions of “jailbreaking” in underground forums surged by 50% in 2024.

cyber security newsEcho Chamber Attack sucess
The Echo Chamber technique represents a new class of semantic-level attacks that exploit how LLMs maintain context and make inferences across dialogue turns.

As AI adoption accelerates, with 92% of Fortune 500 companies integrating generative AI into workflows, the need for robust defense mechanisms becomes increasingly urgent.

The attack demonstrates that traditional token-level filtering is insufficient when models can infer harmful goals without encountering explicit toxic language.

Neural Trust’s research provides valuable insights for developing more sophisticated defense mechanisms, including context-aware safety auditing and toxicity accumulation scoring across multi-turn conversations.



#Cyber_Security #Cyber_Security_News #Vulnerability #cyber_security #cyber_security_news

Оригинальная версия на сайте: New Echo Chamber Attack Jailbreaks Most AI Models by Weaponizing Indirect References
Вернуться к списку новостей К свежим новостям Здесь был google AdSense.
Вместо рекламы товаров началась политическая агитация.
Отключено до получения извинений.

Вернуться к списку новостей Здесь был google AdSense.
Вместо рекламы товаров началась политическая агитация.
Отключено до получения извинений.


Новости проекта CSN:

✉ CSN.net4me.net

Обновление сайта csn.net4me.net

Обновление сайта csn.net4me.net 💻
cyber security news
  • Физически мы переехали на новый сервер. Благодарим наших подписчиков и постоянных читателей за терпение и понимание.
  • Сайт csn.net4me.net полностью адаптирован для работы по шифрованному SSL соединению.
  • Изменен механизм обработки и отображения опасных и критических уязвимостей.

Благодарим что вы с нами.


#CSN_обновление_сайта
https://csn.net4me.net/cyber_security_8301.html

Дополнительный материал

О проекте CSN

Проект CSN.net4me.net родился 16 Марта 2018 года.
Проект находится в самом начале своего развития. Конечно оформление, наполнение будет меняться. Одно останется неизменным - самые свежие новости компьютерной и сетевой безопасности.

О проекте net4me

Проект net4me.net развивался как сборник готовых решений и документации по темам компьютерной безопасности, сетевых решений и СПО (в часности linux). Темпы развития IT отрасли оказались столь быстрыми, что некоторые знания, технологии и информация о них устаревали мгновенно. Тем не менее, некоторый материал net4me.net до сих пор востребован.

Об источниках

Новости берутся CSN из открытых и доступных каждому источников. Авторы проекта стараются подбирать авторитетные и проверенные источники. Но, тем не менее, не несут ответственности за содержимое новостей. В каждой новости указывается источник этой новости, её автор и ссылка на оригинал новости.

Информация

Если вы желаете чтобы новости вашего ресурса были размещены на сайте CSN, то свяжитесь с авторами проекта csn@net4me.net и предложите ссылку на rss или xml ленту новостей вашего ресурса. Любая предложенная информация будет рассмотрена редакцией.