보안 — 프로덕션 RAG를 위한 가드레일(Guardrails) 및 프롬프트 인젝션(Prompt Injection) 방어 - Insights | Molayo

서론 (Introduction)

Chapter 3 (Observability)에서 우리는 시스템 동작을 가시화했습니다. 이제 악성 입력 처리(handling malicious input) 문제를 다룹니다. RAG, Agent, 그리고 MCP 시스템은 모두 프로덕션(production)에 적용하기 전에 보안이 필요합니다.

[이전] 구현은 잘 구성된 입력을 가정함
사용자: "F1 score를 설명해줘" → 정상적인 답변

...

해결해야 할 세 가지 주요 AI 보안 위협:

위협 (Threat)	설명 (Description)	방어 (Defense)
프롬프트 인젝션 (Prompt Injection)	악성 입력이 시스템 프롬프트(system prompt)를 무시하도록 시도함	입력 검증 (Input validation), 시스템 프롬프트 강화 (system prompt hardening)
...

디렉토리 구조 (Directory Structure)

pgvector-tutorial/
├── existing files
└── security/
...

1. 라이브러리 설치 (Install Libraries)

pip install better-profanity
pip freeze > requirements.txt

2. 입력 검증 (Input Validation) — `security/input_validator.py`

악성 입력이 시스템에 도달하기 전에 이를 탐지하고 거부합니다.

# security/input_validator.py
import re
from dataclasses import dataclass
...

mkdir security
python security/input_validator.py

3. 출력 검증 (Output Validation) — `security/output_validator.py`

LLM이 생성한 답변에 문제가 있는지 검증합니다.

# security/output_validator.py
import re
from dataclasses import dataclass
...

4. 통합 가드레일 (Integrated Guardrails) — `security/guardrails.py`

입력 검증, 출력 검증, 그리고 속도 제한(rate limiting)을 결합합니다.

# security/guardrails.py
import time
from collections import defaultdict
...

5. 보안이 강화된 RAG (Security-Hardened RAG) — `security/secure_rag.py`

지금까지 구축한 RAG 파이프라인(pipeline)에 가드레일을 통합합니다.

# security/secure_rag.py
import sys
import os
...

python security/secure_rag.py

6. 보안 설계 원칙 (Security Design Principles)

심층 방어 (Defense in Depth)

계층 1: 입력 검증 (Input validation)      ← 악성 입력을 시스템 외부로 차단
계층 2: 시스템 프롬프트 (System prompt)    ← LLM 동작을 제한
계층 3: 출력 검증 (Output validation)     ← 사용자에게 문제가 있는 출력을 보여주지 않음
...

효과적인 시스템 프롬프트 (System Prompts) 작성하기

# 나쁜 예: 단순히 금지 사항만 나열함
"개인 정보를 출력하지 마세요"

...

규칙 기반 (Rule-based) vs LLM 기반 (LLM-based) 접근 방식

접근 방식	속도	비용	정확도	사용 사례
규칙 기반 (regex)	빠름	무료	패턴 의존적	명확한 공격 패턴 탐지
LLM 기반 (Claude 등)	느림	비용 발생	높음	문맥을 고려한 미묘한 판단

프로덕션 환경에서는 두 가지를 모두 사용하십시오: 빠른 초기 필터링을 위해 규칙 기반 방식을 사용하고, 높은 정확도의 판단을 위해 LLM을 사용합니다.

일반적인 오류 (Common Errors)

오류	원인	해결 방법
`ModuleNotFoundError: security`	경로가 설정되지 않음	`sys.path.append(...)` 확인
...

다음 단계 (Next Steps)

[제5장: MLOps] — CI/CD 파이프라인에 보안 테스트 통합
LLM 기반 입력 검증 (LLM-based input validation) — 규칙 기반 방식이 놓치는 정교한 공격을 탐지하기 위해 LLM 사용
침투 테스트 (Penetration testing) — 공격 시나리오를 체계적으로 테스트

Insights

보안 — 프로덕션 RAG를 위한 가드레일(Guardrails) 및 프롬프트 인젝션(Prompt Injection) 방어

요약

핵심 포인트

서론 (Introduction)

디렉토리 구조 (Directory Structure)

1. 라이브러리 설치 (Install Libraries)

2. 입력 검증 (Input Validation) — `security/input_validator.py`

3. 출력 검증 (Output Validation) — `security/output_validator.py`

4. 통합 가드레일 (Integrated Guardrails) — `security/guardrails.py`

5. 보안이 강화된 RAG (Security-Hardened RAG) — `security/secure_rag.py`

6. 보안 설계 원칙 (Security Design Principles)

심층 방어 (Defense in Depth)

효과적인 시스템 프롬프트 (System Prompts) 작성하기

규칙 기반 (Rule-based) vs LLM 기반 (LLM-based) 접근 방식

일반적인 오류 (Common Errors)

다음 단계 (Next Steps)

댓글

Meta, 에이전트 기대치와 현실 사이의 1,450억 달러 규모 격차를 확인시키다

Qwen과 Alibaba Cloud를 활용한 10개 에이전트 보안 문명 구축

GEAR: 이미지 합성을 위한 가이드형 엔드투엔드 자기회귀 (Guided End-to-End AutoRegression)

2만 달러 규모의 로컬 AI 장비 구축 시 실제 손익분기점 계산

Meta, 에이전트 기대치와 현실 사이의 1,450억 달러 규모 격차를 확인시키다

Qwen과 Alibaba Cloud를 활용한 10개 에이전트 보안 문명 구축

GEAR: 이미지 합성을 위한 가이드형 엔드투엔드 자기회귀 (Guided End-to-End AutoRegression)

2만 달러 규모의 로컬 AI 장비 구축 시 실제 손익분기점 계산

Insights

보안 — 프로덕션 RAG를 위한 가드레일(Guardrails) 및 프롬프트 인젝션(Prompt Injection) 방어

요약

핵심 포인트

서론 (Introduction)

디렉토리 구조 (Directory Structure)

1. 라이브러리 설치 (Install Libraries)

2. 입력 검증 (Input Validation) — security/input_validator.py

3. 출력 검증 (Output Validation) — security/output_validator.py

4. 통합 가드레일 (Integrated Guardrails) — security/guardrails.py

5. 보안이 강화된 RAG (Security-Hardened RAG) — security/secure_rag.py

6. 보안 설계 원칙 (Security Design Principles)

심층 방어 (Defense in Depth)

효과적인 시스템 프롬프트 (System Prompts) 작성하기

규칙 기반 (Rule-based) vs LLM 기반 (LLM-based) 접근 방식

일반적인 오류 (Common Errors)

다음 단계 (Next Steps)

댓글

Meta, 에이전트 기대치와 현실 사이의 1,450억 달러 규모 격차를 확인시키다

Qwen과 Alibaba Cloud를 활용한 10개 에이전트 보안 문명 구축

GEAR: 이미지 합성을 위한 가이드형 엔드투엔드 자기회귀 (Guided End-to-End AutoRegression)

2만 달러 규모의 로컬 AI 장비 구축 시 실제 손익분기점 계산

Meta, 에이전트 기대치와 현실 사이의 1,450억 달러 규모 격차를 확인시키다

Qwen과 Alibaba Cloud를 활용한 10개 에이전트 보안 문명 구축

GEAR: 이미지 합성을 위한 가이드형 엔드투엔드 자기회귀 (Guided End-to-End AutoRegression)

2만 달러 규모의 로컬 AI 장비 구축 시 실제 손익분기점 계산

2. 입력 검증 (Input Validation) — `security/input_validator.py`

3. 출력 검증 (Output Validation) — `security/output_validator.py`

4. 통합 가드레일 (Integrated Guardrails) — `security/guardrails.py`

5. 보안이 강화된 RAG (Security-Hardened RAG) — `security/secure_rag.py`