SaaS 앱에 AI 비디오 생성 기능을 추가하는 방법

앱에 비디오 생성 기능을 추가하는 것은 이미지 생성 기능을 추가하는 것과는 다릅니다. API 호출은 즉시 반환되지만, 비디오는 아직 준비되지 않은 상태입니다. 여러분은 작업 ID (task ID)를 받게 되며, 작업이 완료될 때까지 "완료되었나요?"라고 계속 물어봐야 합니다.

대부분의 개발자들은 비디오 API를 처음 호출할 때, 비디오 URL이 포함된 응답 본문(response body)을 기다렸다가 대신 작업 ID를 받게 되면서 이 문제에 직면합니다. 이 가이드는 작업 제출, 결과 폴링 (polling), 실패 처리, 그리고 URL이 만료되기 전에 결과물을 저장하는 전체 흐름을 안내합니다.

구축하게 될 것

텍스트 프롬프트(text prompt)나 이미지를 받아 비디오 생성 작업을 제출하고, 완료될 때까지 폴링(polling)한 뒤 최종 비디오 URL을 반환하는 백엔드 서비스입니다. 여러분은 단일 API 키를 통해 Veo 3 Fast, Sora 2, Kling Video, Runway 등 네 가지 모델을 사용하게 됩니다.

사전 요구 사항:

Python 3.8+ 또는 Node.js 18+
CometAPI 키
REST API에 대한 기본적인 이해

비디오 생성이 왜 다른지 이해하기

이미지 생성의 경우, 요청을 보내면 동일한 응답 내에서 이미지를 돌려받습니다. 반면 비디오 생성은 비동기 작업 큐 (async task queue)를 사용합니다:

제출 (Submit): 생성 요청을 보냄 → task_id를 받음
폴링 (Poll): 몇 초마다 상태 엔드포인트 (status endpoint)를 확인
상태가 종료 상태 (terminal state)에 도달하면 비디오 URL을 받음
다운로드 및 저장 (Download and store): 비디오를 다운로드하여 저장 — URL은 일시적입니다

만약 비디오 생성을 이미지 생성처럼 취급하여 첫 번째 응답에 비디오가 포함되기를 기다린다면, 여러분의 요청은 매번 타임아웃 (time out)될 것입니다.

실제 운영 환경의 웹 서비스에서는 이 폴링 루프 (polling loop)가 요청 핸들러 (request handler)가 아닌 백그라운드 워커 (background worker, Celery, Bull 또는 유사한 도구)에서 실행되어야 합니다. 아래 예제들은 동기식 폴링 (synchronous polling)을 사용합니다. 이는 스크립트나 프로토타입에는 적합하지만, 동시 접속 사용자를 처리하기에는 적합하지 않습니다.

모델 선택

모델 (Model)	제공업체 (Provider)	최대 재생 시간 (Max duration)	가격 (CometAPI 기준)	최적 용도 (Best for)
Veo 3 Fast	Google	8초	$0.05/초	빠른 프로토타이핑 (Fast prototyping), 소셜 클립
...
출처 (Source): CometAPI 모델 페이지, 2026년 5월. 참고: "Sora 2"는 CometAPI의 모델 식별자 (model identifier)입니다. 기반 모델에 대한 자세한 내용은 해당 모델 페이지를 참조하세요.

Veo 3 Fast는 텍스트-to-비디오 (text-to-video)와 이미지-to-비디오 (image-to-video)를 모두 지원합니다. 초당 비용이 가장 저렴하여 시작 단계에서 사용하기 좋습니다.
Sora 2는 비디오와 함께 오디오를 네이티브하게 생성합니다. 별도의 TTS (Text-to-Speech) 단계 없이 대화, 주변 소음 및 효과음을 포함합니다.
Kling Video는 negative_prompt, cfg_scale, 카메라 움직임 설정 및 pro 모드를 제공합니다. 네 가지 모델 중 제어력이 가장 높습니다.
Runway는 CometAPI를 통해 이미지-to-비디오 (image-to-video)만 지원합니다. 정지 이미지와 움직임에 대한 설명을 제공하면 애니메이션을 생성합니다.

Veo 작업 제출하기 (Submit a Veo task)

Veo는 multipart/form-data를 사용합니다. Python의 requests 라이브러리에서 올바르게 전송하려면 files=를 사용해야 합니다. data=dict를 사용하면 application/x-www-form-urlencoded 방식으로 전송되는데, 이는 multipart/form-data와 다릅니다.

import requests
import os
from dotenv import load_dotenv

load_dotenv()

def submit_veo_task(prompt: str, size: str = "16x9") -> str:
    """Veo 3 Fast 텍스트-to-비디오 작업을 제출합니다. task_id를 반환합니다."""
    api_key = os.getenv("COMETAPI_KEY")
    if not api_key:
        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")
    
    response = requests.post(
        "https://api.cometapi.com/v1/videos",
        headers={"Authorization": f"Bearer {api_key}"},
        files={
            "prompt": (None, prompt),
            "model": (None, "veo3-fast"),
            "size": (None, size)
        },
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]

task_id = submit_veo_task("A paper kite drifting above a wheat field on a windy afternoon")
print(f"Task submitted: {task_id}")

결과 폴링 (Poll for the result)

결과 폴링(Poll for the result)

import time
def poll_veo_task(task_id: str, interval: int = 10, max_wait: int = 600) -> str:
    """Veo 작업이 완료될 때까지 폴링합니다. 비디오 URL을 반환합니다."""
    api_key = os.getenv("COMETAPI_KEY")
    if not api_key:
        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")
    headers = {"Authorization": f"Bearer {api_key}"}
    url = f"https://api.cometapi.com/v1/videos/{task_id}"
    elapsed = 0
    while elapsed < max_wait:
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        result = response.json()
        status = result.get("status")
        if status == "succeeded":
            return result["output"][0]
        elif status in ("failed", "cancelled"):
            raise RuntimeError(
                f"작업 {task_id}이(가) 상태 '{status}'로 실패했습니다: "
                f"{result.get('error', '오류 상세 정보 반환 안 됨')}"
            )
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"작업 {task_id}이(가) {max_wait}초 내에 완료되지 않았습니다")
video_url = poll_veo_task(task_id)
print(f"비디오 준비 완료: {video_url}")

Kling Video를 사용하여 더 많은 제어 기능을 활용하세요

Kling은 다른 엔드포인트 구조를 사용하며 JSON을 이용합니다. 참고로, Kling의 터미널 상태 문자열은 "succeed"이며 ("succeeded"가 아님) — 이는 API의 실제 응답 형식과 일치합니다:

def submit_kling_task(prompt: str, duration: str = "5", mode: str = "std") -> str:
    """Kling 텍스트-투-비디오 작업을 제출합니다. task_id를 반환합니다."""
    api_key = os.getenv("COMETAPI_KEY")
    if not api_key:
        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")
    response = requests.post(
        "https://api.cometapi.com/kling/v1/videos/text2video",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model_name": "kling-v1-6",
            "prompt": prompt,
            "negative_prompt": "blurry, low quality, watermark",
            "cfg_scale": 0.5,
            "mode": mode,         # "std" 또는 "pro"
            "aspect_ratio": "16:9",
            "duration": duration  # "5" 또는 "10"
        },
        timeout=30
    )
    response.raise_for_status()
    return response.json()["data"]["task_id"]
def poll_kling_task(task_id: str, interval: int = 10, max_wait: int = 600) -> str:
    """Kling 작업을 완료될 때까지 폴링합니다. 비디오 URL을 반환합니다."""
    api_key = os.getenv("COMETAPI_KEY")
    if not api_key:
        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")
    headers = {"Authorization": f"Bearer {api_key}"}
    url = f"https://api.cometapi.com/kling/v1/videos/text2video/{task_id}"
    elapsed = 0
    while elapsed < max_wait:
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        result = response.json()
        status = result["data"]["task_status"]
        if status == "succeed":  # Kling은 "succeeded"가 아닌 "succeed"를 사용합니다
            return result["data"]["task_result"]["videos"][0]["url"]
        elif status == "failed":
            error_detail = result.get("data", {}).get("task_result", "상세 정보 없음")
            raise RuntimeError(
                f"Kling 작업 {task_id} 실패: {error_detail}"
            )
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Kling 작업 {task_id}가 {max_wait}초 후에 시간 초과되었습니다")

출처: CometAPI Kling Video docs

Runway로 정지 이미지 애니메이션화하기

Runway는 이미지-투-비디오 (image-to-video) 전용입니다. 또한 추가적인 헤더 (X-Runway-Version)가 필요합니다.

def submit_runway_task(image_url: str, motion_prompt: str, duration: int = 5) -> str:    """Runway 이미지-투-비디오 (image-to-video) 작업을 제출합니다. task_id를 반환합니다."""    api_key = os.getenv("COMETAPI_KEY")    if not api_key:        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")    response = requests.post(
        "https://api.cometapi.com/runwayml/v1/image_to_video",
        headers={
            "Authorization": f"Bearer {api_key}",
            "X-Runway-Version": "2024-11-06",
            "Content-Type": "application/json"
        },
        json={
            "model": "gen3a_turbo",
            "promptImage": image_url,  # 반드시 안정적인 HTTPS URL이어야 합니다
            "promptText": motion_prompt,
            "duration": duration,
            "ratio": "1280:720",
            "watermark": False
        },
        timeout=30
    )
    response.raise_for_status()
    return response.json()["id"]

def poll_runway_task(task_id: str, interval: int = 5, max_wait: int = 600) -> str:    """Runway 작업을 폴링 (Poll) 합니다. 완료되면 비디오 URL을 반환합니다."""    api_key = os.getenv("COMETAPI_KEY")    if not api_key:        raise ValueError("COMETAPI_KEY 환경 변수가 설정되지 않았습니다")    headers = {
        "Authorization": f"Bearer {api_key}",
        "X-Runway-Version": "2024-11-06"
    }
    url = f"https://api.cometapi.com/runwayml/v1/tasks/{task_id}"
    elapsed = 0
    while elapsed < max_wait:
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        result = response.json()
        status = result.get("status")
        if status == "task_not_exist":
            # CometAPI 전용: 작업이 아직 초기화 중입니다. 몇 초 후 다시 시도합니다.
            time.sleep(interval)
            elapsed += interval
            continue
        elif status == "succeeded":
            return result["output"][0]
        elif status in ("failed", "cancelled"):
            raise RuntimeError(f"Runway 작업 {task_id} 실패: {result.get('error', '상세 내용 없음')}")
        time.sleep(interval)
        elapsed += interval
    raise TimeoutError(f"Runway 작업 {task_id}가 {max_wait}초 후에 시간 초과되었습니다")

출처: CometAPI Runway docs

URL이 만료되기 전에 비디오 저장하기

생성 API에서 제공하는 비디오 URL은 일시적입니다. 파일을 즉시 다운로드하여 직접 제어할 수 있는 곳에 저장하세요:

import requests
import pathlib

def download_video(url: str, output_path: str) -> None:
    """스트리밍(streaming)을 사용하여 URL에서 로컬 파일로 비디오를 다운로드합니다."""
    out = pathlib.Path(output_path)
    if out.parent != pathlib.Path("."):
        out.parent.mkdir(parents=True, exist_ok=True)
    with requests.get(url, stream=True, timeout=60) as r:
        r.raise_for_status()
        with open(out, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)
    print(f"Saved to {output_path}")

# 전체 흐름 (Full flow)
task_id = submit_veo_task("A timelapse of clouds moving over a city skyline")
video_url = poll_veo_task(task_id)
download_video(video_url, "output/city_timelapse.mp4")

프로덕션(production) 환경에서는 로컬 파일 쓰기 대신 S3, Cloudflare R2 또는 원하는 스토리지로 업로드하도록 교체하세요. 스트리밍 패턴은 동일합니다. 비디오 전체를 메모리에 로드하는 대신 바이트(bytes)를 직접 파이프(pipe)로 전달하세요.

오류 처리 (Handle failures)

증상	예상 원인	해결 방법
작업이 10분 이상 대기(queued) 상태로 멈춤	서버 부하 또는 모델 사용 불가	다른 모델로 재시도
...
출처: CometAPI video generation docs

Node.js 버전

Node.js 18 이상 버전에는 fetch와 FormData가 기본적으로 포함되어 있습니다. 이 예제는 네 가지 모델 모두를 다룹니다: