본문으로 건너뛰기

© 2026 Molayo

HN중요요약2026. 04. 24. 14:42

Ragas: LLM 애플리케이션 평가를 위한 오픈소스 툴킷

요약

Ragas는 대규모 언어 모델(LLM) 기반 애플리케이션의 성능을 객관적이고 체계적으로 평가하고 최적화할 수 있도록 설계된 종합적인 오픈소스 라이브러리입니다. 기존의 주관적이고 시간이 많이 걸리는 평가 방식에서 벗어나, LLM 기반 및 전통적인 지표를 활용하여 정밀한 데이터 중심의 검증 워크플로우를 제공합니다. 또한, 테스트 데이터셋 자동 생성 기능과 LangChain 등 인기 프레임워크와의 원활한 통합을 지원하며, 개발자가 실제 운영 환경(Production) 데이터를 활용해 지속적으로 애플리케이션을 개선할 수 있도록 돕습니다.

핵심 포인트

  • Ragas는 LLM 기반 및 전통적인 지표를 사용하여 RAG 시스템의 성능을 객관적이고 정밀하게 평가하는 데 특화된 도구입니다.
  • 테스트 데이터셋이 준비되지 않은 경우에도, 운영 환경에 맞는 포괄적인 테스트 데이터셋 자동 생성 기능을 제공합니다.
  • LangChain과 같은 인기 LLM 프레임워크와 완벽하게 통합되며, `ragas quickstart` 명령어를 통해 쉽게 평가 프로젝트를 시작할 수 있습니다.
  • Aspect Critique와 같은 커스텀 지표(DiscreteMetric)를 사용하여 출력의 특정 측면(예: 요약 정확도)을 정밀하게 검증하는 것이 가능합니다.

Supercharge Your LLM Application Evaluations 🚀

Objective metrics, intelligent test generation, and data-driven insights for LLM apps

Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient evaluation workflows. Don't have a test dataset ready? We also do production-aligned test set generation.

  • 🎯 Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
  • 🧪 Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
  • 🔗 Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
  • 📊 Build feedback loops: Leverage production data to continually improve your LLM applications.

Installation:

Pypi:

pip install ragas

Alternatively, from source:

pip install git+https://github.com/vibrantlabsai/ragas

The fastest way to get started is to use the ragas quickstart command:

# List available templates
ragas quickstart
# Create a RAG evaluation project
ragas quickstart rag_eval
# Specify where you want to create it.
ragas quickstart rag_eval -o ./my-project

Available templates:

  • rag_eval - Evaluate RAG systems

Coming Soon:

  • agent_evals - Evaluate AI agents
  • benchmark_llm - Benchmark and compare LLMs
  • prompt_evals - Evaluate prompt variations
  • workflow_eval - Evaluate complex workflows

agas comes with pre-built metrics for common evaluation tasks. For example, Aspect Critique evaluates any aspect of your output using DiscreteMetric:

import asyncio
from openai import AsyncOpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory
# Setup your LLM
client = AsyncOpenAI()
llm = llm_factory("gpt-4o", client=client)
# Create a custom aspect evaluator
metric = DiscreteMetric(
    name="summary_accuracy",
    allowed_values=["accurate", "inaccurate"],
    prompt="""Evaluate if the summary is accurate and captures key information.
Response: {response}
Answer with only 'accurate' or 'inaccurate'."""
)
# Score your application's output
async def main():
    score = await metric.ascore(
        llm=llm,
        response="The summary of the text is..."
    )
    print(f"Score: {score.value}") # 'accurate' or 'inaccurate'
    print(f"Reason: {score.reason}")
if __name__ == "__main__":
    asyncio.run(main())

Note: Make sure your OPENAI_API_KEY environment variable is set.

Find the complete Quickstart Guide

In the past 2 years, we have seen and helped improve many AI applications using evals. If you want help with improving and scaling up your AI application using evals.

🔗 Book a slot or drop us a line: founders@vibrantlabs.com.

If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.


Developers: Those who built with ragas.
(You have import ragas somewhere in your project)

Contributors: Those who make ragas better.
(You make PR to this repo)

We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.

  • Fork the repository
  • Create your feature branch (git checkout -b feature/AmazingFeature)
  • Commit your changes (git commit -m 'Add some AmazingFeature')
  • Push to the branch (git push origin feature/AmazingFeature)
  • Open a Pull Request

At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.
✅ No personal or company-identifying information
✅ Open-source data collection code
✅ Publicly available aggregated data

To opt-out, set the RAGAS_DO_NOT_TRACK environment variable to true

AI 자동 생성 콘텐츠

본 콘텐츠는 HN AI Engineering의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.

원문 바로가기
6

댓글

0