본문으로 건너뛰기

© 2026 Molayo

OpenAI헤드라인2026. 04. 24. 12:09

엔터프라이즈 환경을 위한 에이전트 시스템 스케일링 전략 (Netomi 사례)

요약

본 글은 Netomi가 유나이티드 항공(United Airlines) 및 DraftKings 같은 Fortune 500 고객사를 대상으로 에이전트 기반 AI 시스템을 엔터프라이즈 환경에 성공적으로 확장한 핵심 노하우를 담고 있습니다. 복잡하고 변동성이 큰 실제 업무 흐름(예: 예약 엔진, CRM, 결제 시스템 연동)에서 신뢰성을 확보하기 위해 GPT-4.1과 GPT-5.2를 활용하는 이중 모델 전략을 사용합니다. 특히, 실시간 응답 속도와 대규모 트래픽 처리 능력을 보장하기 위해 동시성(Concurrency) 아키텍처를 설계했으며,

핵심 포인트

  • GPT-4.1의 낮은 지연 시간과 예측 가능한 툴 호출 기능을 활용하여 실시간 워크플로우의 신뢰성을 확보하고, GPT-5.2로 깊이 있는 다단계 계획(multi-step planning)을 수행합니다.
  • 엔터프라이즈급 에이전트 시스템은 단순한 API 연결을 넘어 예약 엔진, CRM, 결제 등 여러 시스템 간의 복잡하고 불완전한 데이터를 처리할 수 있어야 합니다.
  • DraftKings와 같은 환경에서 40,000 TPS 이상의 트래픽 급증에도 불구하고 3초 미만의 응답 시간과 98%의 의도 분류 정확도를 유지했습니다.
  • AI 시스템의 신뢰성 확보를 위해 거버넌스(Governance) 레이어를 런타임에 내장하여 스키마 유효성 검사, 정책 강제 적용, PII 보호 등의 기능을 수행합니다.

Netomi’s lessons for scaling agentic systems into the enterprise

Enterprises expect AI agents to handle messy workflows reliably, honor policies by default, operate under heavy load, and show their work.

Netomi builds systems that meet that high bar, serving Fortune 500 customers like United Airlines and DraftKings. Their platform pairs GPT‑4.1 for low-latency, reliable tool use with GPT‑5.2 for deeper, multi-step planning, running both inside a governed execution layer designed to keep model-driven actions predictable under real production conditions.

Running agentic systems at this scale has given Netomi a blueprint for what makes these deployments work inside the enterprise.

“Our goal was to orchestrate the many systems a human agent would normally juggle and do it safely at machine speed.”

A single enterprise request rarely maps to a single API. Real workflows span booking engines, loyalty databases, CRM systems, policy logic, payments, and knowledge sources. The data is often incomplete, conflicting, or time-sensitive. Systems that depend on brittle flows collapse under this variability.

Netomi designed its Agentic OS so OpenAI models sit at the center of a governed orchestration pipeline built for this level of ambiguity. The platform uses GPT‑4.1 for fast, reliable reasoning and tool-calling—critical for real-time workflows—and GPT‑5.2 when multi-step planning or deeper reasoning is required.

To ensure consistent agent behavior across long, complex tasks, Netomi follows the agentic prompting patterns recommended by OpenAI:

  • Persistence reminders to help GPT‑5.2 carry reasoning across long, multi-step workflows
  • Explicit tool-use expectations, suppressing hallucinated answers by steering GPT‑4.1 to call tools for authoritative information during transactional operations
  • Structured planning, which leverages GPT‑5.2’s deeper reasoning to outline and execute multi-step tasks
  • Agent-driven rich media decisions, relying on GPT‑5.2 to detect and signal when a tool call should return images, videos, forms, or other rich, multimodal elements

Together, these patterns help the model reliably map unstructured requests to multi-step workflows and maintain state across discontinuous interactions.

Few industries expose the need for multi-step reasoning as clearly as airlines, where one interaction routinely spans multiple systems and policy layers. A single question may require checking fare rules, recalculating loyalty benefits, initiating ticket changes, and coordinating with flight operations.

“In airlines, context changes by the minute. AI has to reason about the scene the customer is in—not just execute a siloed task,” said Mehta. “That’s why situational awareness matters way more than just workflows, and why a context-led ensemble architecture is essential.”

With GPT‑4.1 and GPT‑5.2, Netomi can keep extending these patterns into richer multi-step automations—using the models not just to answer questions, but to plan tasks, sequence actions, and coordinate the backend systems a major airline depends on.

In high-pressure moments—rebooking during a storm, resolving a billing issue, or handling sudden spikes in demand—users will abandon any system that hesitates. Latency defines trust.

Most AI systems fail because they execute tasks sequentially: classify → retrieve → validate → call tools → generate output. Netomi instead designed for concurrency, taking advantage of low-latency streaming and tool-calling stability of GPT‑4.1.

GPT‑4.1 provides fast time-to-first-token and predictable tool-calling behavior, which make this architecture viable at scale; while GPT‑5.2 provides deeper multi-step reasoning paths when needed. Netomi’s concurrency framework ensures the total system, not just the model, stays under critical latency thresholds.

These concurrency demands aren’t unique to airlines. Any system exposed to sudden, extreme traffic surges needs the same architectural discipline. DraftKings, for instance, regularly stress-tests this model, with traffic during major sporting events spiking above 40,000 concurrent customer requests per second.

During such events, Netomi has sustained sub-three-second responses with 98% intent classification accuracy, even as workflows touch accounts, payments, knowledge lookups, and regulatory checks.

“AI is central and critical to how we support customers in the moments that matter most,” said Paul Liberman, Co-Founder and President of Operations at DraftKings. “Netomi’s platform helps us handle massive spikes in activity with agility and precision.”

At scale, Netomi’s concurrency model depends on the fast, predictable tool-calling of GPT‑4.1, which keeps multi-step workflows responsive under extreme load.

Enterprise AI must be trustworthy by design, with governance woven directly into the runtime—not added as an external layer.

When intent confidence drops below threshold, or when a request cannot be classified with high certainty, Netomi’s governance mechanisms kick in to determine how the request is handled, ensuring the system backs off from free-form generation in favor of controlled execution paths.

At a technical level, the governance layer handles:

  • Schema validation, which validates every tool call against expected arguments and OpenAPI contracts before execution
  • Policy enforcement that applies topic filters, brand restrictions, and compliance checks inline during reasoning and tool use
  • PII protection to detect and mask sensitive data as part of pre-processing and response handling
  • Deterministic fallback, routing back to known-safe behaviors when intent, data, or tool calls are ambiguous
  • Runtime observability, exposing token traces, reasoning steps, and tool-chain logs for real-time inspection and debugging

In highly regulated domains like dental insurance, this kind of governance is non-negotiable. A Netomi customer in the insurance industry processes close to two million provider requests each year across all 50 states, including eligibility checks, benefits lookups, and claim status inquiries where a single incorrect response can create downstream regulatory or service risk.

During open enrollment, when scrutiny and volume peaked, the company needed AI that enforced policy as part of the runtime itself. Netomi’s architecture was up to that complex requirement.

“We built the system so that if the agent ever reaches uncertainty, it knows exactly how to back off safely,” said Mehta. “The governance is not bolted on—it’s part of the runtime.”

Netomi’s path shows what it takes to earn enterprise trust: build for complexity, parallelize to meet latency demands, and bake governance into every workflow. OpenAI models form the reasoning backbone, while Netomi’s systems engineering ensures that intelligence is operationally safe, auditable, and ready for Fortune 500 environments.

These principles helped Netomi scale across some of the world’s most demanding industries—and offer a blueprint for any startup looking to turn agentic AI into production-grade infrastructure.

Deploying agentic systems inside Fortune 500 environments demands speed, accuracy, and built-in governance. Netomi’s architecture delivers all three, sustaining performance even during extreme traffic surges and complex, multi-step workflows.

  • Delivered sub-three-second responses during high-traffic events
  • Maintained 98% intent classification accuracy at scale
  • Handled traffic spikes exceeding 40,000 concurrent customer requests per second
  • Embedded governance directly into the runtime, with deterministic fallback and policy enforcement

AI 자동 생성 콘텐츠

본 콘텐츠는 OpenAI Blog의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.

원문 바로가기
6

댓글

0