본문으로 건너뛰기

© 2026 Molayo

OpenAI헤드라인2026. 04. 24. 22:50

Codex 활용으로 28일 만에 Sora Android 앱 출시 비결

요약

OpenAI가 Sora의 안드로이드 버전을 단 28일 만에 프로토타입에서 글로벌 출시까지 이끈 과정을 공개했습니다. 핵심은 'Codex'라는 AI 에이전트를 마치 숙련된 시니어 엔지니어처럼 활용한 것입니다. 개발팀은 코드를 직접 작성하기보다, Codex에게 방향을 제시하고 검토하며(reviewing & directing) 아키텍처 설계와 사용자 경험(UX), 시스템적 변화 등 고차원적인 작업에 집중했습니다. 이 방식은 전통적인 프로젝트에서 발생하는 커뮤니케이션 오버헤드와 병목 현상을 극복하는 효율적인 방법론을 제시합니다.

핵심 포인트

  • Sora Android 앱의 초기 버전은 Codex를 활용하여 28일 만에 개발되었으며, 이는 AI 에이전트가 개발 속도를 혁신적으로 높일 수 있음을 입증했습니다.
  • Codex는 대규모 코드베이스 이해, 단위 테스트 작성, 피드백 적용 등에서 탁월한 능력을 보였으나, 아키텍처 설계나 사용자 경험(UX) 같은 '경험적 판단'은 인간의 개입이 필수적이었습니다.
  • 성공적인 AI 협업 모델은 Codex에게 명확한 목표, 제약 조건, 그리고 프로젝트 전반의 패턴 가이드라인을 제공하는 방식으로 작동했습니다.
  • 개발팀은 시스템 아키텍처, 모듈화, 의존성 주입(Dependency Injection) 등 장기적 관점의 핵심 트레이드오프 결정에 집중하여 AI의 결과물을 검토하고 방향을 제시하는 역할을 수행했습니다.

How We Used Codex to Ship Sora for Android in 28 Days

In November, we launched the Sora Android app to the world, giving anyone with an Android device the ability to turn a short prompt into a vivid video. On launch day, the app reached #1 in the Play Store. Android users generated more than a million videos in the first 24 hours.

Behind the launch is a story: the initial version of Sora’s production Android app was built in 28 days, thanks to the same agent that’s available to any team or developer: Codex.

From October 8 to November 5, 2025, a lean engineering team working alongside Codex and consuming roughly 5 billion tokens, shipped Sora for Android from prototype to global launch. Despite its scale, the app has a crash-free rate of 99.9 percent and an architecture we’re proud of. If you’re wondering whether we used a secret model, we used an early version of the GPT‑5.1‑Codex model – the same version that any developer or business can use today via CLI, IDE extension, or web app.

When Sora launched on iOS, usage exploded. People immediately began generating a stream of videos. On Android, by contrast, we had only a small internal prototype and a mounting number of pre-registered users on Google Play.

A common response to a high stakes, time-pressured launch is to pile on resources and add process. A production app of this scope and quality would typically involve many engineers working for months, slowed down by coordination.

American computer architect Fred Brooks famously warned that “adding more people to a late software project makes it later.” In other words, when trying to ship a complex project quickly, adding more engineers can often slow down efficiency by adding to communication overhead, task fragmentation, and integration costs. We leaned into this insight instead of ignoring it; we assembled a strong team of four engineers – all equipped with Codex to drastically increase each engineer’s impact.

Working this way, we shipped an internal build of Sora for Android to employees in 18 days and launched publicly 10 days later. We maintained a high bar on Android engineering practices, invested in maintainability, and held the app to the same reliability bar we would expect from a more traditional project. (We also continue to use Codex extensively today to evolve and bring new features to the app).

To make sense of how we worked with Codex, it helps to know where it shines and where it needs direction. Treating it like a newly hired senior engineer was a good approach. Codex’s ability meant we could spend more time directing and reviewing code than writing it ourselves.

Where Codex needs guidance

  • Codex isn’t yet great at inferring what it hasn’t been told (e.g., your preferred architecture patterns, product strategy, real user behavior, and internal norms or shortcuts).
  • Similarly, Codex couldn’t see the app actually run: It couldn’t open Sora on a device, notice that a scroll felt off, or sense that a flow was confusing. Only our team could cover these experiential tasks.
  • Each instance requires onboarding. Sharing context with clear goals, constraints, and guidance on “how we do things” was essential to making Codex execute well.
  • In the same vein, Codex struggled with deep architectural judgment: Left on its own, it might introduce an extra view model where we really wanted to extend an existing one or push logic into the UI layer that clearly belonged in a repository. Its instinct is to get something working, not to prioritize long‑term cleanliness.

Where Codex excels

  • Reading and understanding large codebases rapidly: Codex knows essentially all major programming languages, which makes it easier to leverage the same concepts across many platforms without complex abstractions.
  • Testing coverage: Codex is (uniquely) enthusiastic about writing unit tests to cover a broad variety of cases. Not every test was deep, but having breadth of coverage was helpful in preventing regressions.
  • Applying feedback: In a similar vein, Codex is good at reacting to feedback. When CI failed, we could paste log output into a prompt and ask Codex to propose fixes.
  • Massively parallel, disposable execution: Most won’t push the limits of the number of sessions they could actually run at any one time. It’s highly feasible to test multiple ideas in parallel and view code as disposable.
  • Offering new perspective: In design discussions, we used Codex as a generative tool to explore potential failure points and discover new ways to solve a problem. For example, while we designed video player memory optimizations, Codex sifted through multiple SDKs to propose approaches we wouldn’t have had time to parse. The insights from Codex’s research proved invaluable in minimizing memory footprint in the final app.
  • Enabling higher‑leverage work: In practice, we ended up spending more time reviewing and directing code than writing it ourselves. That said, Codex is very good at code review, too, often catching bugs before they’re merged, improving reliability.

Once we acknowledged these characteristics, our working model became more straightforward. We leaned on Codex to do a huge amount of heavy lifting inside well‑understood patterns and well‑bounded scopes, while our team focused on architecture, user experience, systemic changes, and final quality.

Even the best new, senior hire doesn’t have the right vantage point for making long-term trade-offs right away. To leverage Codex and ensure its work was robust and maintainable, it was key that we oversaw the app’s systems design and key trade-offs ourselves. These included shaping the app’s architecture, modularization, dependency injection, and navigation; we also implemented authentication and base networking flows.

From this foundation, we wrote a few representative features end‑to‑end. We used the rules we wanted the entire codebase to follow and documented project‑wide patterns as we went. By pointing Codex to representative features, it was able to work more independently within our standards. For a project that we estimate was 85% written by Codex, a carefully planned foundation avoided costly backtracking and refactoring. It was one of the most important decisions we made.

The idea was not to make “something that works” as quickly as possible, rather to make “something that gets how we want things to work.” There are many “correct” ways to write code. We didn’t need to tell Codex exactly what to do; we needed to show Codex what’s “correct” on our team. Once we had established our starting point and how we liked to build, Codex was ready to start.

To see what would happen, we did try prompting: “Build the Sora Android app based on the iOS code. Go,” but quickly aborted that path. While what Codex created technically worked, the product experience was sub-par. And without a clear understanding of endpoints, data, and user flows, Codex’s single-shot code was unreliable (Even without using an agent, it’s risky to merge thousands of lines of code.)

We hypothesized Codex would thrive in a sandbox of well-written examples; and we were right. Asking Codex to “build this settings screen” with almost no context was unreliable. Asking Codex to “build this settings screen using the same architecture and patterns as this other screen you just saw” worked far better. Humans made the structural decisions and set the invariants; Codex then filled in large amounts of code inside that structure.

Our next step in maximizing Codex’s potential was figuring out how to enable Codex to work for long periods of time (recently, more than 24 hours), unsupervised.

Early on in using Codex, we jumped to prompts like, “Here is the feature. Here are some files. Please build it.” That sometimes worked, but mostly produced code that technically compiled, while straying from our architecture and goals.

So we changed the workflow. For any non‑trivial change, we first asked Codex to help us understand how the system and code work. For example, we’d ask it to read a set of related files and summarize how that feature works; for example, how data flows from the API through the repository layer, the view model, and into the UI. Then we would correct or refine its understanding. (For example, we’d point out that a particular abstraction really belongs in a different layer or that a given class exists only for offline mode and should not be extended.)

Similarly to how you might engage a new, highly capable teammate, we worked with Codex to create a solid implementation plan. That plan often looked like a miniature design document directing which files should change, what new states should be introduced, and how logic should flow. Only then did we ask Codex to start applying the plan, one step at a time. One helpful tip: for very long tasks, where we hit the limit of our context window, we’d ask Codex to save its plan to a file, allowing us to apply the same direction across instances.

This extra planning loop turned out to be worth the time. It allowed us to let Codex run “unsupervised” for long stretches, because we knew its plans. It made code review easier, because we could check the implementation against our plan rather than reading a diff without context. And when something went wrong, we could debug the plan first and the code second.

The dynamic felt similar to the way a good design document gives a tech lead confidence in a project. We weren’t just generating code: we were producing code that supported a shared roadmap.

At the peak of the project, we were often running multiple Codex sessions in parallel. One was working on playback, another on search, another on error handling, and sometimes another on tests or refactors. It felt less like using a tool and more like managing a team.

Each session would periodically report back to us with progress. One might say, “I’m done planning out this module; here’s what I propose,” while another would offer a large diff for a new feature. Each required attention, feedback, and review. It was uncannily similar to being a tech lead with several new engineers, all making progress, all needing guidance.

The result was a collaborative flow. Codex’s raw coding capability freed us from a lot of manual typing. We had more time to think about architecture, read pull requests carefully, and test out the app.

At the same time, that extra speed meant we always had something waiting in our review queue. Codex didn’t get blocked by context switching, but we did. Our bottleneck in development shifted from writing code to making decisions, giving feedback, and integrating changes.

This is where Brooks’s insights land in a new way. You can’t simply add Codex sessions and expect linear speedups any more than you can keep adding engineers to a project and expect the schedule to shrink linearly. Each additional “pair of hands,” even virtual ones, adds coordination overhead. We had become the conductor of an orchestra versus simply faster solo players.

We started our project with a huge stepping stone: Sora had already shipped on iOS. We frequently pointed Codex at the iOS and backend codebases to help it understand key requirements and constraints. Throughout the project we joked that we had reinvented the idea of a cross‑platform framework. Forget React Native or Flutter; the future of cross‑platform is just Codex.

Beneath the quip are two principles:

  • Logic is portable. Whether the code is written in Swift or Kotlin, the underlying application logic – data models, network calls, validation rules, business logic – are the same. Codex is very good at reading a Swift implementation and producing an equivalent in Kotlin that preserves semantics.
  • Concrete examples provide powerful cont

AI 자동 생성 콘텐츠

본 콘텐츠는 OpenAI Blog의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.

원문 바로가기
5

댓글

0