arXiv논문2026. 05. 06. 16:48

HeadsUp: 대규모 멀티뷰 캡처에서 고품질 3D 가우스 헤드 복원

요약

HeadsUp은 대규모 멀티카메라 설정에서 고품질 3D 가우스(Gaussian) 형태의 인간 머리를 복원하기 위한 확장 가능한 피드포워드 방법을 제안합니다. 이 방법은 입력 이미지를 컴팩트한 잠재 표현으로 압축하고, 이를 중립 헤드 템플릿에 기반하여 UV 파라미터화된 3D 가우스 집합으로 디코딩하는 효율적인 인코더-디코더 아키텍처를 사용합니다. HeadsUp은 대규모 데이터셋에서 최첨단 복원 품질을 달성하며, 테스트 시간 최적화 없이 새로운 정체성에 일반화되는 강력한 성능을 보여줍니다.

핵심 포인트

대규모 멀티뷰 캡처 환경에 적합한 확장 가능한 3D 헤드 복원 방법론 제시 (HeadsUp).
입력 뷰의 개수와 해상도에 독립적인 UV 파라미터화된 잠재 표현을 사용하여 학습 효율성 극대화.
10,000개 이상의 피사체를 포함하는 대규모 내부 데이터셋으로 검증되어 최첨단 복원 품질 달성.
테스트 시간 최적화(TTO) 없이 새로운 정체성에 대한 일반화 능력을 입증하며, 잠재 공간을 활용한 신규 ID 생성 및 애니메이션에 응용 가능함.

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation.

이 논문은 대규모 멀티카메라 설정에서 고품질 3D 가우스 (3D Gaussian) 헤드를 복원하기 위한 확장 가능한 피드포워드 (feed-forward) 방법인 HeadsUp 을 제안합니다. 우리의 방법은 입력 뷰를 컴팩트한 잠재 표현 (latent representation) 으로 압축하는 효율적인 인코더 - 디코더 (encoder-decoder) 아키텍처를 사용합니다.

This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views.

이 잠재 표현은 중립 헤드 템플릿 (neutral head template) 에 고정된 UV 파라미터화된 3D 가우스 집합으로 디코딩됩니다. 이 UV 표현은 3D 가우스의 개수를 입력 이미지의 개수와 해상도에서 분리하여, 많은 고해상도 입력 뷰로 훈련할 수 있게 합니다.

We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization.

우리는 기존 멀티뷰 인간 헤드 데이터셋보다 10 배 (an order of magnitude) 더 큰 내부 데이터셋을 사용하여 모델을 훈련하고 평가했습니다. HeadsUp 은 최첨단 복원 품질 (state-of-the-art reconstruction quality) 을 달성하며, 테스트 시간 최적화 (test-time optimization) 없이 새로운 정체성 (novel identities) 에 일반화됩니다.

We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs.

우리는 정체성, 뷰, 모델 용량에 걸쳐 우리의 모델의 스케일링 동작을 광범위하게 분석하여, 품질 - 컴퓨팅 (quality-compute) 트레이드오프에 대한 실용적인 통찰력을 제공합니다.

Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

마지막으로, 우리는 두 가지 다운스트림 애플리케이션 (downstream applications) 을 통해 잠재 공간의 강점을 강조합니다: 새로운 3D 정체성 생성 및 표현 블렌드샷 (expression blendshapes) 으로 3D 헤드를 애니메이션화.

AI 자동 생성 콘텐츠

원문 바로가기

HeadsUp: 대규모 멀티뷰 캡처에서 고품질 3D 가우스 헤드 복원

요약

핵심 포인트

댓글