r/ClaudeAI분석2026. 05. 06. 19:59

Claude Code 에이전트가 항상 구стар 컨텍스트에서 작업합니다. 저는 이를 되돌림, 재연산, 모든 수정에 앞서서 유지할 수 있도록

요약

이 기술 기사는 장기간의 코드 작업 세션에서 AI 에이전트가 구식(stale) 컨텍스트를 사용하여 발생하는 근본적인 문제를 지적합니다. 저자는 'Memtrace'라는 도구를 개발하여 이 문제를 해결했는데, 이는 단순히 컨텍스트 창 크기를 늘리는 것이 아니라, 변경 사항을 추적하고 시간의 흐름에 따른 코드 상태를 정확하게 기억하는 데 중점을 둡니다. Memtrace는 모든 편집마다 증분 스냅샷(incremental snapshot)을 생성하여 에이전트가 항상 최신 상태를 유지하도록 하며, 코드를 이중 시간성(bi-temporally)으로 저장하여 어떤 변경 사항이 언제 발생했는지 '되감기 및 재연산' 기능을 제공합니다. 또한, LLM 추론 없이 Tree-sitter와 하이브리드 검색(BM25 + HNSW)을 결합하여 빠르고 정확한 구조적/의미적 코드 이해를 구현했습니다.

핵심 포인트

AI 코딩 에이전트는 장기 세션에서 구식 컨텍스트를 사용하여 부정확하게 작업하는 경향이 있다.
Memtrace는 모든 편집에 대해 증분 스냅샷을 생성하여 에이전트가 항상 최신 상태의 코드베이스를 참조하도록 보장한다.
코드베이스를 이중 시간성(bi-temporally)으로 저장하여, 버그 발생 시 코드가 어떤 상태였는지 '되감기 및 재연산'할 수 있다.
구조적 이해를 위해 LLM 추론 없이 Tree-sitter와 하이브리드 검색(BM25 + HNSW)을 사용하여 효율성과 정확성을 높였다.

Every long Claude Code session has the same hidden failure mode: the agent is always working from stale context.

It re-reads the same 12 files across three sessions to "remind itself" of an interface you already showed it. It refactors getUserById without checking who calls it. It edits a config with no memory of why the previous version was that way. It's not the context window. The window is fine. There's no persistent, time-aware representation of your codebase for the agent to re-query. So it guesses. And you pay tokens for every re-read.

I built Memtrace to fix exactly this.

Two things it does that no other memory tool does:

(1) Always-fresh state. Every edit you make triggers a 42ms incremental snapshot of the changes applied by the coding agent. The agent's memory is never one-session-old. After a refactor it knows the blast radius before you do: every caller, every test, every consumer of the function you just touched. Your agent stops asking "what does getUserById return?" 30 seconds after seeing it.

(2) Rewind and replay. This is the part nobody else has. Your codebase is stored bi-temporally so every change becomes a recallable episode. When the agent debugs a regression, it can replay how the broken function got to its current state.

What worked before.
What changed when.
Which commit introduced the bug

Not just "guess from current state.", instead replay.

My architectural bet that makes both possible: zero LLM inference during indexing. Tree-sitter parses your code into an AST, and the AST IS the structural representation. You don't pay an LLM to re-derive what your compiler already knows.

Retrieval is hybrid. Tantivy BM25 for lexical recall (the "find getUserById" query). Jina-code 768-dim embeddings indexed in HNSW for semantic recall (the "find anything that authenticates a user" query). Two ranked lists, fused with Reciprocal Rank Fusion at k=60. One signal alone misses, together they hit. The embedding model matters here: Jina-code is trained on code, not generic prose, so the semantic side actually understands "this is an auth handler" instead of pattern-matching on the word "auth."

The bi-temporal layer is what makes rewind possible. Every node and edge carries valid_time AND transaction_time, so "what did this function look like Monday" is a real query, not a git-blame heuristic. It's also what gives the agent the blast radius before a refactor: typed edges (CALLS, IMPORTS, IMPLEMENTS, EXTENDS, CONTAINS, TYPE_REFERENCES, INSTANTIATES) traversed in graph time, not text time.

Speed only matters because freshness has to be cheap. If snapshotting after every edit is expensive, you can't afford to do it on every edit. So the indexing path is bottlenecked by I/O, not LLM tokens.

I built it using Claude Code. Mid-build, Claude Code lost the plot on Memtrace's own architecture and it started contradicting decisions from 50 turns earlier. It re-read the same files. It forgot which retrieval weights I'd already tuned. I was experiencing the exact pain I was building Memtrace to solve, while building Memtrace.

When the beta binary was ready, I pointed it at Memtrace's own codebase. The session-loss stopped. The blind refactor suggestions stopped.

It's free, but the binary currently requires an approval key, just so you are warned.

Not gatekeeping. Not marketing. The indexer keeps tripping on patterns I didn't anticipate: mixed pnpm/npm lockfiles, Rust proc-macros, Python Python TYPE_CHECKING blocks. Every one of these came from real beta users in the last two weeks, not from my test corpus. When that happens I want to ship you a fix in 24 hours, not lose you to a flaky first impression. So I'm pacing approvals to my own feedback bandwidth, not your patience. I'd rather have 500 users for whom this is magic than 50,000 for whom it's broken.

I'm trying to keep approval under 24h, but capping at 50 per week right now. The benchmark harness is fully open and runnable without the key, if

AI 자동 생성 콘텐츠

원문 바로가기

Claude Code 에이전트가 항상 구стар 컨텍스트에서 작업합니다. 저는 이를 되돌림, 재연산, 모든 수정에 앞서서 유지할 수 있도록

요약

핵심 포인트

댓글