GitHub요약2026. 05. 22. 03:44

lucidrains/mixture-of-experts

요약

연산량을 유지하면서 모델 파라미터를 확장할 수 있는 Sparsely Gated Mixture of Experts(MoE)의 PyTorch 구현체입니다. 단일 머신용 MoE와 GShard 논문 기반의 계층적 MoE를 모두 지원합니다.

핵심 포인트

연산량 효율을 유지하며 모델 용량 대폭 확장 가능
PyTorch 기반의 MoE 및 계층적 MoE 구현 제공
사용자 정의 전문가 네트워크 구성 지원
pip를 통한 간편한 라이브러리 설치 가능

연산량을 일정하게 유지하면서 언어 모델의 용량(파라미터 수)을 대폭 증가시키기 위한 Sparsely Gated Mixture of Experts (희소 게이트 혼합 전문가)의 Pytorch 구현체입니다.

이 구현체는 몇 가지 개선 사항을 포함하여, 여기 있는 tensorflow 구현을 대부분 한 줄씩 그대로 옮겨온 것입니다.

업데이트: 이제 ST Mixture of Experts를 사용해야 합니다.

$ pip install mixture_of_experts

import torch
from torch import nn
from mixture_of_experts import MoE
...

위의 코드는 단일 머신에서 충분하지만, GShard 논문에서 사용된 것과 같이 계층적 혼합 전문가 (hierarchical mixture of experts, 2단계)를 사용하고 싶다면 아래 지침을 따르십시오.

import torch
from mixture_of_experts import HeirarchicalMoE
moe = HeirarchicalMoE(
...

10억 (1 billion) 개의 파라미터

import torch
from mixture_of_experts import HeirarchicalMoE
moe = HeirarchicalMoE(
...

전문가 (experts)를 위해 더 정교한 네트워크를 원한다면, 직접 정의하여 MoE 클래스에 experts로 전달할 수 있습니다.

import torch
from torch import nn
from mixture_of_experts import MoE
...

@misc{shazeer2017outrageously,
title = {Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer},
author = {Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc Le and Geoffrey Hinton and Jeff Dean},
...

@misc{lepikhin2020gshard,
title = {GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding},
author = {Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen},
...

AI 자동 생성 콘텐츠

원문 바로가기

lucidrains/mixture-of-experts

요약

핵심 포인트

댓글