Stream-R1: Reward-guided distillation for streaming video generation

Stream-R1: 보상 유도蒸馏을 통한 스트리밍 비디오 생성

각 픽셀을 동등하게 취급하는 대신, 롤아웃 (rollouts) 간의 상호 신뢰도 (Inter-Reliability) 와 공간 - 시간 (space-time) 간의 내부 퍼플렉시티 (Intra-Perplexity) 를 사용하여 단일 보상 모델 (single reward model) 로 가중치를 재배분합니다. Wan2.1 교사 (teacher) 를 23.1 FPS 로 초과하며, 제로 (zero) [이미지: https://pbs.twimg.com/media/HHtG_vIXUAEj9wV?format=jpg&name=small]