arXiv중요논문2026. 04. 24. 21:47

LLM의 '일회성' 대화 취약점 공격 기법 (TTI) 분석

요약

최근 LLM이 민감한 워크플로우에 통합되면서, 모델의 적대적 견고성과 안전성이 중요해졌습니다. 본 논문은 새로운 다중 턴(multi-turn) 공격 기법인 Transient Turn Injection (TTI)을 소개합니다. TTI는 지속적인 대화 맥락 유지 없이도 여러 독립적인 상호작용에 걸쳐 악의적인 의도를 분산시켜 모델의 상태 비저장성(stateless moderation) 취약점을 체계적으로 공격합니다. 이 연구는 OpenAI, Anthropic, Google Gemini 등 주요 LLM들을 대상으로 광범위한 평가를 진행하여

핵심 포인트

새로운 다중 턴 공격 기법인 Transient Turn Injection (TTI)을 제시하며, 이는 지속적인 대화 맥락 의존성을 우회합니다.
OpenAI, Anthropic, Google Gemini 등 최신 LLM들을 대상으로 광범위한 블랙박스 평가를 수행했습니다.
연구 결과, TTI는 모델의 상태 비저장성(stateless moderation) 취약점을 노리며, 특히 의료와 같이 높은 위험도가 요구되는 도메인에서 새로운 공격 표면을 발견했습니다.
효과적인 방어를 위해서는 세션 수준의 맥락 집계 및 지속적인 적대적 테스트가 필수적임을 강조합니다.

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across isolated interactions. TTI leverages automated attacker agents powered by large language models to iteratively test and evade policy enforcement in both commercial and open-source LLMs, marking a departure from conventional jailbreak approaches that typically depend on maintaining persistent conversational context.

Our extensive evaluation across state-of-the-art models—including those from OpenAI, Anthropic, Google Gemini, Meta, and prominent open-source alternatives—uncovers significant variations in resilience to TTI attacks, with only select architectures exhibiting substantial inherent robustness. Our automated blackbox evaluation framework also uncovers previously unknown model specific vulnerabilities and attack surface patterns, especially within medical and high stakes domains. We further compare TTI against established adversarial prompting methods and detail practical mitigation strategies, such as session level context aggregation and deep alignment approaches. Our study underscores the urgent need for holistic, context aware defenses and continuous adversarial testing to future proof LLM deployments against evolving multi-turn threats.

AI 자동 생성 콘텐츠

원문 바로가기

LLM의 '일회성' 대화 취약점 공격 기법 (TTI) 분석

요약

핵심 포인트

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

댓글