본문으로 건너뛰기

© 2026 Molayo

OpenAI중요헤드라인2026. 04. 24. 14:01

GPT-5.1 기반 Tolan: 대화형 음성 AI의 새로운 기준 제시

요약

Tolan은 GPT-5.1 모델을 활용하여 낮은 지연 시간(low latency), 정확한 문맥 관리, 그리고 일관된 페르소나를 갖춘 '음성 우선(voice-first)' AI 컴패니언을 개발했습니다. 기존 챗봇이 단순 질의응답에 머물렀다면, Tolan은 장시간의 비선형적 대화에 초점을 맞춥니다. 핵심 아키텍처는 매 턴마다 문맥 창(context window)을 처음부터 재구성하고, 고속 벡터 데이터베이스를 이용해 감정 '바이브'까지 기억하는 메모리 시스템을 구축했습니다. 이를 통해 사용자 이탈률 감소 및 일관성 유지에 성공하며 음

핵심 포인트

  • GPT-5.1의 출시로 모델의 조향성(steerability)과 지연 시간(latency)이 크게 개선되어 자연스러운 대화 흐름을 구현할 수 있었습니다.
  • Tolan은 매 턴마다 문맥 창을 처음부터 재구성하는 아키텍처를 채택하여, 갑작스러운 주제 전환에도 일관성을 유지합니다.
  • 단순한 사실 기록을 넘어 감정적 '바이브(vibe)' 신호까지 저장하고 검색하는 고속 메모리 시스템을 구축했습니다 (Turbopuffer 사용).
  • 페르소나와 톤 변화를 실시간으로 모니터링하며, 사용자 경험 개선에 따라 기억 회상 누락률이 30% 감소했습니다.
  • Tolan은 이미 월간 활성 사용자(MAU) 20만 명을 돌파했으며, 음성-시각-문맥을 통합하는 멀티모달 시스템으로 확장할 계획입니다.

How Tolan builds voice-first AI with GPT‑5.1

With GPT‑5.1, Tolan built a voice app optimized for low latency, accurate context, and stable personalities as conversations evolve.

Tolan is a voice-first AI companion where people talk with a personalized, animated character that learns from conversations over time.

Built by Portola, a veteran team with a prior exit, the app is designed for ongoing, open-ended dialogue rather than quick prompts and replies. “We saw the rise of ChatGPT and knew voice was the next frontier,” says Quinten Farmer, co-founder and CEO of Portola. “But voice is harder. You’re not just responding to typed prompts; you’re holding a live, meandering conversation.”

Voice AI raises the bar on latency and context management, but it also enables more open-ended, exploratory interactions than text.

With foundation models becoming faster, cheaper, and more capable, the team focused their efforts on two key levers: memory and character design. Portola built a character-driven universe, shaped by award-winning animators and a science fiction writer, using a real-time context management system to keep personality and memory consistent as conversations unfold.

The release of the GPT‑5.1 models marked a turning point, delivering major gains in steerability and latency that brought those pieces together, unlocking a more responsive and engaging voice experience.

“GPT-5.1 gave us the steerability to finally express the characters we had in mind. It wasn’t just smarter—it was more faithful to the tone and personality we wanted to create.”

Tolan’s Architecture: Built for Voice

Tolan’s architecture is shaped by the demands of voice. Voice users expect instant, natural responses, even when conversations shift midstream. Tolan had to respond quickly, track changing topics, and maintain a consistent personality without lag or tone drift.

To feel natural, conversations required near-instant latency. Introducing OpenAI GPT‑5.1 and the Responses API cut speech initiation time by over 0.7 seconds—enough to noticeably improve conversational flow.

Equally critical was how the system handled context. Unlike many agents that cache prompts across multiple turns, Tolan rebuilds its context window from scratch each turn. Each context reconstruction pulls in a summary of recent messages, a persona card, vector-retrieved memories, tone guidance, and real-time app signals. This architecture allows Tolan to adapt in real time to abrupt topic shifts, an essential requirement for natural voice-based interaction.

“We realized quickly that cached prompts just didn’t cut it,” says Quinten. “Users change subjects all the time. To feel seamless, the system had to adapt midstream.”

This real-time reconstruction approach is both technically intensive and foundational to Tolan’s success.

Context handling is important, but it wasn’t enough to keep conversations feeling coherent over time. To support long, nonlinear conversations, Tolan built a memory system that retains not just facts and preferences, but also emotional “vibe” signals—clues that help steer how a Tolan should respond.

Memories are embedded using the OpenAI text-embedding-3-large model and stored in Turbopuffer, a high-speed vector database that enables sub-50ms lookup times. This speed is essential for real-time voice interactions. Each turn, Tolan uses the user’s latest message and system-synthesized questions (e.g., “Who is the user married to?”) to trigger memory recall. To keep memory quality high, Tolan runs a nightly compression job that removes low-value or redundant entries (e.g. “the user drank coffee today”) and resolves contradictions.

Personality is just as carefully managed. Each Tolan is seeded with a distinct character scaffold, authored by the team’s in-house science fiction writer and refined by a behavioral researcher. These seeds give Tolans consistency, but also flexibility to adapt over time, evolving alongside the user.

A parallel system monitors the emotional tenor of the conversation and dynamically adjusts the Tolan’s delivery. This allows a Tolan to shift seamlessly from playful to grounded depending on user cues, without losing its core personality.

The transition to GPT‑5.1 was a turning point. Suddenly, layered prompt instructions—tone scaffolds, memory injections, character traits—were followed more faithfully. Prompts that once required workarounds began behaving as intended.

“For the first time, our internal experts felt like the model was really listening,” says Quinten. “Instructions stayed intact across long conversations, persona traits were respected, and we saw far less drift.”

Those changes added up to a more consistent and believable personality, which in turn created a more engaging user experience. The Tolan team saw clear, measurable gains: memory recall misses dropped by 30% (based on in-product frustration signals), and next-day user retention rose more than 20% after GPT‑5.1–powered personas went live.

Guiding Principles for Voice AI

As Tolan evolved, a few principles emerged that now guide how the team builds and evolves its voice architecture:

  • Design for conversational volatility: Voice conversations shift mid-sentence. Systems need to pivot just as quickly to feel natural.
  • Treat latency as part of the product experience: Sub-second responsiveness shapes whether a voice agent feels conversational or mechanical.
  • Build memory as a retrieval system, not a transcript: High-quality compression and fast vector search deliver more consistent personality than oversized context windows.
  • Rebuild context every turn: Don’t fight drift with bigger prompts. Regenerating context each turn keeps agents grounded as conversations meander.

Together, these lessons form the foundation for Tolan’s next phase of innovation and set the direction for where voice AI is headed.

Since launching in February 2025, Tolan has grown to more than 200,000 monthly active users. Its 4.8-star rating and more than 100,000 App Store reviews highlight how well the system maintains consistency across long, shifting conversations. One reviewer noted, “They remember things we talked about two days ago and they bring it back into the conversation that we’re having today.”

These signals map directly to the underlying architecture: low-latency model calls, turn-by-turn context reconstruction, and modular memory and persona systems. Together, they allow Tolan to track topic changes, preserve tone, and keep responses grounded without relying on large, fragile prompts.

Looking ahead, Tolan plans to deepen its investments in steerability and memory refinement, focusing its efforts on tighter compression, improved retrieval logic, and expanded persona tuning. The long-term goal is to expand what a voice interface can be: not just responsive, but context-aware and conversationally dynamic.

“The next frontier,” says Quinten, “is building voice agents that aren’t just responsive, but truly multimodal, able to integrate voice, vision, and context into a single, steerable system.”

AI 자동 생성 콘텐츠

본 콘텐츠는 OpenAI Blog의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.

원문 바로가기
5

댓글

0