ContextLens — LLM 프롬프트 내부를 위한 py-spy/pprof

요약

ContextLens는 LLM 에이전트의 컨텍스트 윈도우를 진단하는 프로파일러입니다. 멀티 턴 루프에서 발생하는 중복 토큰과 비용 낭비를 분석하여 시각화된 보고서와 해결책을 제공합니다.

핵심 포인트

컨텍스트를 시스템 프롬프트, 도구 결과 등 영역별로 분해 분석
SHA-256 해싱을 통해 반복 청구되는 중복 블록 추적
5가지 탐지기로 비용 낭비 요소를 달러 단위로 산출
D3 트리맵을 활용한 상호작용형 HTML 보고서 생성
OpenAI 및 Anthropic API와 실시간 연동 지원

멀티 턴 (multi-turn) 에이전트 루프에서는 매 API 호출 시 전체 컨텍스트 (context)가 다시 전송됩니다. 3번째 턴에서 추가된 도구 결과 (tool result)는 4, 5, 6, 7번째 턴... 이후로 영원히 다시 비용이 청구됩니다. 그중 대부분은 다시 읽히지 않습니다.

표준 관측성 (observability) 도구들은 전체 토큰 수를 알려줍니다. 하지만 그 안에 무엇이 들어있는지 또는 _얼마나 많은 부분이 낭비되고 있는지_는 절대 알려주지 않습니다.

그것이 바로 ContextLens가 해결하고자 하는 문제입니다.

기능

ContextLens는 LLM 에이전트 컨텍스트 윈도우 (context window)를 위한 진단 프로파일러 (diagnostic profiler)입니다. 이 도구는 다음을 수행합니다:

컨텍스트 윈도우를 다음과 같은 영역으로 분해합니다: 시스템 프롬프트 (system prompt), 도구 스키마 (tool schemas), 도구 결과 (tool results), 검색된 청크 (retrieved chunks), 사용자 메시지 (user messages), 어시스턴트 메시지 (assistant messages)
SHA-256 콘텐츠 해싱 (content hashing)을 사용하여 어떤 블록이 턴을 거치며 다시 비용이 청구되는지 추적합니다.
5가지 낭비 탐지기 (waste detectors)를 실행하고 발견 사항을 달러 비용 기준으로 순위를 매깁니다.
각 발견 사항에 대해 구체적인 한 줄 해결책을 출력합니다.
독립 실행형 HTML 파일로 상호작용 가능한 D3 트리맵 (treemap) 보고서를 렌더링합니다.

API 키가 필요하지 않습니다. 저장된 트레이스 (traces)를 사용하여 오프라인에서 작동합니다.

5가지 탐지기

탐지기	발견 내용
Duplicate (중복)	여러 턴에 걸쳐 동일한 블록이 토씨 하나 틀리지 않고 다시 전송됨
...

---내장 데모 실행 (30턴 에이전트 루프를 시뮬레이션하며, API 키가 필요하지 않습니다):

python -c "import contextlens; contextlens.demo()"

또는

python examples/demo.py
실시간 캡처 — Anthropic

import anthropic
import contextlens as cl

client = anthropic.Anthropic()

with cl.capture_anthropic(client, model="claude-3-5-sonnet-20241022") as collector:
for turn in range(20):
client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="You are a helpful assistant.",
messages=build_messages(turn),
)

report = cl.analyze_trace(collector.build_trace())
print(f"Recoverable waste: {report.recoverable_tokens:,} tokens (${report.recoverable_cost_usd:.4f})")
실시간 캡처 — OpenAI

import openai
import contextlens as cl

client = openai.OpenAI()

with cl.capture_openai(client, model="gpt-4o") as collector:
for turn in range(20):
client.chat.completions.create(model="gpt-4o", messages=build_messages(turn))

report = cl.analyze_trace(collector.build_trace())
Analyze a saved trace

report = cl.analyze_file("trace.json")
html = cl.render_html_report(report)
open("report.html", "w").write(html)
Example terminal output

Context Composition by Region

Region Tokens Cost (USD) Share
assistant_message 11,490 $0.0345 ###....... 25.5%
tool_result 10,333 $0.0310 ##........ 22.9%
tool_schema 9,450 $0.0284 ##........ 21.0%
retrieved_content 5,805 $0.0174 #......... 12.9%
user_message 4,740 $0.0142 #......... 10.5%
system 3,240 $0.0097 #......... 7.2%
TOTAL 45,058 $0.1352

Re-billing: 43,185 tokens (95.8%) re-billing waste -> $0.1296 recoverable

Top Waste Findings

Type Sev. Wasted Tokens Cost Fix

1 duplicate medium 7,084 $0.0213 Cache or externalize...
2 redundant_ret medium 5,805 $0.0174 Use a re-ranker...
3 unused_schema low 3,150 $0.0095 Remove send_email...
Try the live demo
No install, no API key: https://huggingface.co/spaces/Harshal0610/contextlens

Links
GitHub: https://github.com/HarshalSant/contextlens
Install: pip install contextlens-profiler
License: MIT
Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most?

Quickstart

bash
pip install contextlens-profiler
...

AI 자동 생성 콘텐츠

원문 바로가기

ContextLens — LLM 프롬프트 내부를 위한 py-spy/pprof

요약

핵심 포인트

기능

5가지 탐지기

또는

Type Sev. Wasted Tokens Cost Fix

Quickstart

댓글