Hugging Face에 OVHcloud가 공식 추론 제공자로 합류: 유럽 기반 AI 모델 배포 가속화
요약
OVHcloud가 Hugging Face Hub의 공식 Inference Provider로 지원되면서, 개발자들이 더욱 다양한 오픈 웨이트(open-weight) 모델을 통합적으로 활용할 수 있게 되었습니다. 이 서비스는 유럽 데이터 센터에 위치하며 데이터 주권과 낮은 지연 시간을 보장합니다. 특히 €0.04/백만 토큰부터 시작하는 경쟁력 있는 가격 정책과, sub-200ms의 빠른 첫 토큰 응답 속도를 자랑하여 프로덕션 환경에 최적화되어 있습니다. JS 및 Python 클라이언트 SDK를 통해 간편하게 접근 가능하며, 사용자별 선
핵심 포인트
- OVHcloud가 Hugging Face Hub Inference Provider로 공식 지원되며, gpt-oss, Qwen3, DeepSeek R1 등 주요 오픈 모델에 대한 접근성을 높였습니다.
- 유럽 데이터 센터 기반의 서버리스(serverless) 서비스로 운영되어 데이터 주권(data sovereignty)을 보장하며, 유럽 사용자에게 낮은 지연 시간을 제공합니다.
- 가격 경쟁력이 뛰어나며, API 호출당 €0.04/백만 토큰부터 시작하는 Pay-per-token 모델을 채택했습니다.
- 프로덕션 사용에 최적화된 인프라를 갖추고 있으며, 첫 토큰 응답 시간(first token response time)이 sub-200ms로 매우 빠릅니다.
OVHcloud on Hugging Face Inference Providers 🔥
We're thrilled to share that OVHcloud is now a supported Inference Provider on the Hugging Face Hub! OVHcloud joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub's model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers.
This launch makes it easier than ever to access popular open-weight models like gpt-oss, Qwen3, DeepSeek R1, and Llama — right from Hugging Face. You can browse OVHcloud's org on the Hub at https://huggingface.co/ovhcloud and try trending supported models at https://huggingface.co/models?inference_provider=ovhcloud&sort=trending.
OVHcloud AI Endpoints are a fully managed, serverless service that provides access to frontier AI models from leading research labs via simple API calls. The service offers competitive pay-per-token pricing starting at €0.04 per million tokens.
The service runs on secure infrastructure located in European data centers, ensuring data sovereignty and low latency for European users. The platform supports advanced features including structured outputs, function calling, and multimodal capabilities for both text and image processing.
Built for production use, OVHcloud's inference infrastructure delivers sub-200ms response times for first tokens, making it ideal for interactive applications and agentic workflows. The service supports both text generation and embedding models. You can learn more about OVHcloud's platform and infrastructure at https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/.
Read more about how to use OVHcloud as an Inference Provider in its dedicated documentation page.
See the list of supported models here.
In your user account settings, you are able to:
- Set your own API keys for the providers you've signed up with. If no custom key is set, your requests will be routed through HF.
- Order providers by preference. This applies to the widget and code snippets in the model pages.
As mentioned, there are two modes when calling Inference Providers:
- Custom key (calls go directly to the inference provider, using your own API key of the corresponding inference provider)
- Routed by HF (in that case, you don't need a token from the provider, and the charges are applied directly to your HF account rather than the provider's account)
Model pages showcase third-party inference providers (the ones that are compatible with the current model, sorted by user preference)
The following example shows how to use OpenAI's gpt-oss-120b using OVHcloud as the inference provider. You can use a Hugging Face token for automatic routing through Hugging Face, or your own OVHcloud AI Endpoints API key if you have one.
Note: this requires using a recent version of huggingface_hub (>= 1.1.5).
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key=os.environ[
AI 자동 생성 콘텐츠
본 콘텐츠는 Hugging Face Blog의 원문을 AI가 자동으로 요약·번역·분석한 것입니다. 원 저작권은 원저작자에게 있으며, 정확한 내용은 반드시 원문을 확인해 주세요.
원문 바로가기