AI 기반 비정형 데이터 워크플로우 자동화 플랫폼 'Trellis' 출시
요약
데이터 엔지니어링의 오랜 난제인 비정형 데이터(PDF, 이메일, 음성 통화 등)를 구조화된 SQL 형식으로 변환하는 AI ETL 솔루션 'Trellis'가 공개되었습니다. Trellis는 사용자가 정의한 자연어 스키마에 맞춰 복잡한 문서나 텍스트에서 필요한 데이터를 추출하여 데이터베이스 테이블로 자동 변환합니다. 특히 금융, 법률 등 비정형 데이터 의존도가 높은 산업에서 수작업 데이터 입력 및 분석의 병목 현상을 해소하는 데 강력한 가치를 제공하며, LLM 기반 Map-Reduce와 Vision Model을 활용해 복잡한 기술적 난
핵심 포인트
- Trellis는 전화 통화 녹취록, PDF 문서, 채팅 기록 등 비정형 데이터를 사용자가 정의한 스키마에 맞춰 구조화된 SQL 형식으로 변환하는 AI 기반 ETL 솔루션입니다.
- 핵심 기술로는 LLM 기반 Map-Reduce를 활용하여 장문 문서를 처리하고, Vision Model을 이용해 표 및 레이아웃 정보를 정확하게 추출합니다.
- 금융 서비스 업계에서 신용 리스크 모델 개선에 필수적인 PDF/이메일 데이터를 구조화하거나, 고객 지원 부서의 온보딩 프로세스를 가속화하는 등 다양한 산업별 활용 사례를 제시했습니다.
- 사용자들은 별도의 가입 없이 데모 사이트(demo.runtrellis.com)에서 솔루션을 직접 체험해 볼 수 있으며, API 연동을 위한 문서도 제공됩니다.
Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data
Hey HN — We're Jacky and Mac from Trellis (
https://runtrellis.com/). We’re building AI-powered ETL for unstructured data. Trellis transforms phone calls, PDFs, and chats into structured SQL format based on any schema you define in natural language. This helps data and ops teams automate manual data entry and run SQL queries on messy data.
There’s a demo video at https://www.youtube.com/watch?v=ib3mRh2tnSo and a sandbox to try out (no sign-in required!) at https://demo.runtrellis.com/. An interesting historical archive of unstructured data we thought it would be interesting to run Trellis on top of are old Enron emails which famously took months to review. We’ve created a showcase demo here: https://demo.runtrellis.com/showcase/enron-email-analysis, with some documentation here: https://docs.runtrellis.com/docs/example-email-analytics.
Why we built this:
At the Stanford AI lab where we met, we collaborated with many F500 data teams (including Amazon, Meta, and Standard Chartered), and repeatedly saw the same problem: 80% of enterprise data is unstructured, and traditional platforms can’t handle it. For example, a major commercial bank I work with couldn’t improve credit risk models because critical data was stuck in PDFs and emails.
We realized that our research from the AI lab could be turned into a solution with an abstraction layer that works as well for financial underwriting as it does for analysis of call center transcripts: an AI-powered ETL that takes in any unstructured data source and turns it into a schematically correct table.
Some interesting technical challenges we had to tackle along the way:
(1) Supporting complex documents out of the box: We use LLM-based map-reduce to handle long documents and vision models for table and layout extraction. (2) Model Routing: We select the best model for each transformation to optimize cost and speed. For instance, in data extraction tasks, we could leverage simpler fine-tuned models that are specialized in returning structured JSONs of financial tables. (3) Data Validation and Schema Guarantees: We ensure accuracy with reference links and anomaly detection.
After launching Trellis, we’ve seen diverse use cases, especially in legacy industries where PDFs are treated as APIs. For example, financial services companies need to process complex documents like bonds and credit ratings into a structured format, and need to speed up underwriting and enable pass-through loan processing. Customer support and back-office operations need to accelerate onboarding by mapping documents across different schema and ERP systems, and ensure support agents follow SOPs (security questions, compliance disclosures, etc.). And many companies today want data preprocessing in ETL pipelines and data ingestion for RAG.
We’d love your feedback! Try it out at https://demo.runtrellis.com/. To save and track your large data transformations, you can visit our dashboard and create an account at https://dashboard.runtrellis.com/. If you’re interested in integrating with our APIs, our quick start docs are here: https://docs.runtrellis.com/docs/getting-started. If you have any specific use cases in mind, we’d be happy to do a custom integration and onboarding—anything for HN. :)
Excited to hear about your experience wrangling with unstructured data in the past, workflows you want to automate, and what data integration you would like to see.
AI 자동 생성 콘텐츠
본 콘텐츠는 HN AI Engineering의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.
원문 바로가기