The Complete Claude Code Harness Engineering Guide (5 Layers, 8 Deep-Dives)

요약

이 가이드는 AI 에이전트의 성능을 결정하는 핵심 요소인 'Harness Engineering'에 대해 포괄적으로 다룹니다. Harness는 메모리, 도구 사용, 권한 관리, 훅(Hooks), 관측 가능성 등 모델 외적인 모든 구조적 요소를 포함하며, 이 구조를 개선함으로써 AI 에이전트의 성능을 크게 향상시킬 수 있습니다. 대부분의 개발자가 간과하는 나머지 네 가지 레이어(메모리, 도구, 권한, 훅, 관측 가능성)에 대한 심층적인 학습 경로와 실질적인 구현 방법을 제시하며, 특히 'Hooks'가 단순한 지침을 넘어선 강력한 법적 제약 장치임을 강조합니다.

핵심 포인트

AI 에이전트의 성능은 모델 자체보다 주변 구조(Harness)에 의해 결정된다. (Agent = Model + Harness)
Harness는 메모리, 도구, 권한, 훅, 관측 가능성 등 5가지 핵심 레이어로 구성되어 있다.
메모리(Memory)는 단순한 지침 파일이 아니라 세션 시작 시 읽어지는 진화하는 상태 기록(Failure Log)이어야 한다.
Hooks는 에이전트의 행동을 강제적으로 제어하는 '법'과 같으며, 특히 PreToolUse 훅은 위험한 작업을 원천 차단할 수 있는 가장 강력한 메커니즘이다.

Harness engineering 는 AI 에이전트 모델 외의 모든 것을 의미합니다: 메모리, 도구, 권한, 훅, 관측 가능성. LangChain 은 harness 만 변경하여 벤치마크 점수를 13.7 점이 개선했습니다. 이 가이드는 각 레이어별로 조직화된 큐레이티드 리딩 경로이며, Claude Code harness 의 모든 부분에 대한 딥 디브 포스트가 있습니다. Layer 1 만 (대부분의 개발자가 가진 것) → 모델이 무시할 수 있는 조언 All 5 layers (Memory → Tools → Enforcement the model Permissions → Hooks → Observability) 은 LangChain 을 Terminal Bench 2.0 에서 52.8% 에서 66.5% 로 변경하여 점수를 높였습니다. 같은 모델. 13.7 점의 순수한 구조적 이득 (LangChain Blog, Feb 2026). 대부분의 Claude Code 사용자는 Layer 1 만을 끝냅니다. 이 가이드는 다른 네 가지 레이어로 가는 리딩 경로입니다. harness engineering 의 이론을 원한다면 pillar post 를 읽으세요. architecture deep-dive 를 원한다면 5 layers post 를 읽으세요. 이 포스트는 다르다: 레이어별로 조직화된 탐색 허브이며, 각 주제마다 하나의 딥 디브가 있으며, harness 가 성장할 때마다 다시 방문할 수 있습니다.

Claude Code harness engineering 은 무엇인가? Harness engineering 은 AI 에이전트 주변의 모든 것을 구축하는 학문입니다 — 제약 조건, 도구, 피드백 루프, 관측 가능성 — 그래서 프로덕션에서 신뢰성을 갖게 합니다. Claude Code 에서는 harness 는 다섯 레이어입니다: Memory (CLAUDE.md), Tools (MCP), Permissions (settings.json), Hooks (PreToolUse/PostToolUse), Observability (session logs).

공식: Agent = Model + Harness (Martin Fowler, Apr 2026). 모델은 commoditry 입니다. Sonnet 4.6 또는 Opus 4.7 의 모든 팀은 동일한 원본 능력을 갖습니다. 당신의 harness 는 팀의 출력을 차별화합니다.

Claude Code harness 의 5 레이어는 무엇인가? Layer Purpose Claude Code File 1. Memory 에이전트가 아는 것 CLAUDE.md, MEMORY.md 2. Tools 도달할 수 있는 것 settings.json (MCP) 3. Permissions 할 수 있는 것 settings.json allow/deny 4. Hooks 런타임에 강제되는 것 PreToolUse/PostToolUse 5. Observability 이후에 볼 수 있는 것 Session logs, cost tracking

Layer 1: 에이전트가 타이핑하기 전에 무엇을 아는가? 메모리 레이어는 Claude Code 가 첫 키스트로크 전에 읽는 모든 파일입니다. CLAUDE.md 는 프로젝트 규칙을 갖습니다. MEMORY.md 는 진화하는 상태를 갖습니다. 대부분의 개발자는 CLAUDE.md 만 shipping 하고 그것을 희망의 목록으로 취급합니다. Your AI Agent Forgets Everything. Here's the Fix. — MEMORY.md 는 Claude 가 세션 시작 시 읽는 200 줄 인덱스입니다. 설정은 5 분을 취합니다. 이 글을 먼저 읽으세요. 만약 매주 같은 architecture decisions 을 다시 설명한다면

day. Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log. — Mitchell Hashimoto's AGENTS.md in Ghostty has zero aspirational lines. Every entry traces to a real agent mistake. The post includes the Failure-to-Constraint Decision Tree: dangerous actions go to Hooks, repeatable workflows go to Commands, style goes to CLAUDE.md. Layer 4: What can the agent NOT do? Hooks are the enforcement layer. Memory is advice. Hooks are law. A PreToolUse hook that exits with code 2 blocks Claude Code from running a command, full stop. # PreToolUse hook: 6 lines that save you from yourself if [[ " $TOOL_INPUT " == * "DROP TABLE" * ]] && [[ " $ENV " == "production" ]] ; then echo "BLOCKED: destructive SQL in production" > &2 exit 2 fi exit 0 Which Claude Code Hook Do You Need? A Decision Guide — The 4 handler types (Deny, Log, Transform, Enrich), when to reach for PreToolUse vs PostToolUse, and which 3 hooks every production setup should have. A PreToolUse hook exiting with code 2 is the only mechanism in Claude Code that unconditionally blocks a tool call. Instructions in CLAUDE.md can still be overridden by context or model reasoning. Hooks cannot be bypassed. Layer 5: How do you know what your agent actually did? Observability turns "my agent did something weird" into a reproducible bug report. One of LangChain's three harness improvements was a verification middleware that made the agent check its own work before marking a task complete. Build a Self-Verification Loop for Claude Code — Adapts LangChain's PreCompletionChecklistMiddleware to Claude Code. Boris Cherny (creator of Claude Code) calls verification "probably the most important thing" for quality. LangChain's three improvements mapped to layers: context injection (Layer 1), self-verification loops (Layer 5), and compute allocation (Layer 5). No single layer explained the full +13.7 point gain. They needed three layers working together. Why does this actually work? Three independent data points prove constraints beat capability: LangChain : +13.7 on Terminal Bench 2.0 with harness changes only OpenAI Codex : ~1 million lines of production code, zero human-written lines over five months, all inside heavily constrained harness environments Mitchell Hashimoto's Ghostty : every AGENTS.md line is a prevented failure The Constraint Paradox: Less AI Freedom, Better Code — Breaks down all three data points with benchmark tables and the counterintuitive finding that running at maximum reasoning budget scored

worse (53.9%) than high (63.6%). Read this when someone says "we just need a smarter model." Why does this matter for your career? 84% of developers use AI tools. Only 29% trust the output. That 55-point gap is the senior engineer's new job. One harness committed to version control multiplies across your whole team. Writing a great CLAUDE.md for 10 developers pays off more than writing 10,000 lines of code yourself. Senior Engineers Don't Write Code. They Build Harnesses. — The career case with a harness review checklist for your next PR and the 4-era evolution of where senior engineers add value. Where should you start reading? Three paths based on where you are today: New to harness engineering. Start with the pillar post for the definition, then the 5 layers post for the architecture. Come back here for your next deep-dive. You have a CLAUDE.md and want more rigor. Read the memory fix post first to add MEMORY.md, then the failure-log pattern to rewrite your existing CLAUDE.md. Those two posts cover all of Layer 1. Your agent has scared you at least once. Skip to the hook decision guide and ship one PreToolUse guard before your next session. Then read the constraint paradox for why this actually works. FAQ What is Claude Code harness engineering? Harness engineering for Claude Code is configuring five layers around the model (Memory, Tools, Permissions, Hooks, Observability) to make the agent reliable in production. The model is commodity. The harness is your differentiator. Do I need all 5 layers to start? No. Start with Memory (CLAUDE.md + MEMORY.md) and Hooks (one PreToolUse guard). Those two cover the most common failure modes. Add the rest as your team scales or when a specific incident motivates it. How is harness engineering different from prompt engineering? Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do , using enforcement (hooks, permissions) rather than suggestions (prompts). Does this only apply to Claude Code? The principles apply to any AI coding agent. The implementation details (CLAUDE.md, PreToolUse hooks, MCP config) are Claude Code-specific. Claude Code offers the most programmable harness surface in the market today. Try it now: Pick one path above, open the first linked post, copy one code block into your .claude/ folder, and run one Claude Code session with the change applied. The compound benefit starts on session #2. Which

먼저 추가해야 할 레이어는 무엇인가요? 댓글에 남겨주세요. 원래는 ShipWithAI 에서 게시되었습니다. 저는 Claude Code 워크플로우, AI 보조 개발, 그리고 구조화된 AI 를 통해 소프트웨어를 더 빠르게 출시하는 것에 대해 씁니다.

AI 자동 생성 콘텐츠

원문 바로가기