Supra1.5 모델 제품군 출시!

SupraLabs가 Base, Instruct, 그리고 GGUF를 포함한 Supra-1.5 라인업을 방금 출시했습니다! (Reasoning 모델은 곧 출시 예정)

안녕하세요 r/LocalLLaMA 여러분! 저희는 오늘 전체 Supra-1.5-50M 제품군을 출시합니다. 기존 Supra-50M보다 5배 더 큰 컨텍스트 윈도우 (Context Window)를 가진 새로운 Base 모델, 그 위에 구축된 Instruct 미세 조정 (Fine-tune) 모델, 그리고 어디서든 실행할 수 있도록 준비된 GGUF 양자화 (Quantized) 버전을 선보입니다.

🤗 Supra-1.5-50M-Base-exp | 🤗 Supra-1.5-50M-Instruct-exp | 🤗 GGUF | Supra1.5 50M Instruct Demo

이것들은 실험적인 출시 버전입니다. Project Chimera의 일부입니다.

이 모델은 Alpaca 채팅 형식을 사용합니다!

아래는 작업을 설명하는 지시문(Instruction)입니다. 요청을 적절히 완료하는 응답을 작성하세요.

Instruction:

[INSTRUCTION]

Response:

추가 입력이 있는 경우:
아래는 작업을 설명하는 지시문과 추가적인 맥락을 제공하는 입력이 쌍을 이룬 것입니다. 요청을 적절히 완료하는 응답을 작성하세요.

Instruction:

[INSTRUCTION]

Input:

[CONTEXT]

Response:

Supra-50M에서 무엇이 바뀌었나요?
가장 큰 업그레이드는 컨텍스트 (Context)입니다. Supra-1.5는 RoPE 스케일링 (RoPE scaling)을 사용하여 1,024에서 5,120 토큰으로 확장되었으며, 도구 호출 (Tool calling) 데이터, ChatML 대화, 사실적 텍스트, 수학 데이터가 혼합된 3B 토큰으로 지속적 사전 학습 (Continued Pretraining)을 수행했습니다. 동일한 아키텍처 (Architecture)와 동일한 토크나이저 (Tokenizer)를 사용하지만, SFT (Supervised Fine-Tuning) 및 향후 RL (Reinforcement Learning) 작업을 위한 훨씬 더 나은 베이스를 갖추고 있습니다.

사양	Supra-50M	Supra-1.5-50M
컨텍스트 길이 (Context length)	1,024 토큰	5,120 토큰
학습 데이터 (CPT)	20B 토큰 (사전 학습)	3T 토큰 (지속적 학습) (실험적 1T)
데이터 혼합 (Data mix)	Fineweb-Edu 전용	도구 호출, ChatML, 사실적, 수학
Instruct 형식	Alpaca	ChatML

벤치마크 (Instruct)
BLiMP는 평가 전반에 걸쳐 일관되게 67.4를 기록하고 있습니다. 이 모델은 또한 흥미로운 원시(Raw) 대 정규화(Normalized) 정확도 분할을 보여주었습니다. 과학 및 사실적 작업은 원시 추론 (Raw inference) 하에서 더 잘 수행되는 반면, 수학 및 논리 작업은 정규화된 추론 (Normalized inference)의 이점을 얻습니다. 50M 모델이라는 점을 고려하여 이를 판단해 보시기 바랍니다.

이 모델은 이미 AxiomicLabs의 Open SLM 리더보드(Leaderboard)에 등재되어 있습니다.

빠른 시작(Quick start)
Base model:

from transformers import pipeline, torch
print("[*] Loading Supra-1.5-50M Base...")
pipe = pipeline("text-generation", model="SupraLabs/Supra-1.5-50M-Base-exp", device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
def generate_text(prompt, max_new_tokens=150):
    result = pipe(
        prompt,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']
print(generate_text("The importance of education is"))

Instruct model:

import os, warnings
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
warnings.filterwarnings("ignore", category=UserWarning, module="transformers")
import torch
from transformers import pipeline, AutoTokenizer, logging
logging.set_verbosity_error()
MODEL_ID = "SupraLabs/Supra-1.5-50M-Instruct-exp"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, clean_up_tokenization_spaces=False)
pipe = pipeline("text-generation", model=MODEL_ID, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32)
def build_prompt(instruction, input_text=""):
    if input_text.strip():
        return (
            "Below is an instruction that describes a task, paired with an input "
            "that provides further context. Write a response that appropriately "
            "completes the request.\n\n"
            f"### Instruction:\n{instruction}\n\n"
            f"### Input:\n{input_text}\n\n### Response:\n"
        )
    return ("Below is an instruction that describes a task.")

Write a response that "appropriately completes the request.

" f"### Instruction:
{instruction}

### Response:
" ) def generate(instruction, input_text=""): result = pipe( build_prompt(instruction, input_text), max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.9, repetition_penalty=1.15, pad_token_id=pipe.tokenizer.pad_token_id, eos_token_id=pipe.tokenizer.eos_token_id, return_full_text=False ) return result[0]['generated_text'].strip() while True: print("
Enter an instruction (or 'exit' to quit):") user_input = input().strip() if user_input.lower() == "exit": break print("
Enter additional context (optional, press Enter to skip):") context_input = input().strip() print(f"\nResponse:\n{generate(user_input, context_input)}\n") 
 GGUF 양자화(quantization): 
  Bits Quant Size 
  1-bit Q1_D 19.6 MB 
  1-bit TQ1_0 25.1 MB 
  2-bit Q2_K 28.8 MB 
  2-bit TQ2_0 26.4 MB 
  3-bit IQ3_S 31 MB 
  3-bit Q3_K_S 31 MB 
  3-bit IQ3_M 31.7 MB 
  3-bit Q3_K_M 32.7 MB 
  3-bit Q3_K_L 33.8 MB 
  4-bit IQ4_XS 33.8 MB 
  4-bit Q4_K_S 35.7 MB 
  4-bit IQ4_NL 34.7 MB 
  4-bit Q4_0 34.5 MB 
  4-bit Q4_1 36.8 MB 
  4-bit Q4_K_M 권장(recommended) 37.4 MB 
  5-bit Q5_K_S 39.5 MB 
  5-bit Q5_0 39 MB 
  5-bit Q5_1 41.2 MB 
  5-bit Q5_K_M 41 MB 
  6-bit Q6_K 45.8 MB 
  8-bit Q8_0 56.2 MB 
  16-bit BF16 105 MB 
  16-bit F16 105 MB 
  32-bit F32 208 MB 
 
 GGUF와 llama.cpp 사용: 
 # 직접 실행 (Q4_K_M을 원하는 양자화 모델로 대체): llama-cli -hf SupraLabs/Supra-1.5-50M-instruct-exp-gguf:Q4_K_M \ --chat-template alpaca \ -p "Write a short poem about open source AI." \ -n 256 # 또는 로컬 OpenAI 호환 서버로 실행: llama-server -hf SupraLabs/Supra-1.5-50M-instruct-exp-gguf:Q4_K_M \ --chat-template alpaca \ -c 5120 
 무엇이 다음인가요? 
 Supra-124M - Base, Chat, Reasoning (레거시 제품군, 운영 중) 
 Supra-350M - Base, Chat, Reasoning, Coding (레거시 제품군, 운영 중) 
 모든 가중치 Apache 2.0. 피드백 환영합니다!

제출자: /u/Dangerous_Try3619
 [링크] [댓글]

Insights

Supra1.5 모델 제품군 출시!

요약

핵심 포인트

Instruction:

Response:

Instruction:

Input:

Response:

댓글

BSF, 배양 가죽의 미국 시장 진출을 위한 합작 투자 발표로 주가 56% 급등

Microsoft 실적 발표의 3가지 주요 시사점

화물 시장 업데이트: 공급 능력이 부족한 5가지 신호

연방준비제도(Fed)가 56년 만에 전례 없는 조치를 취했습니다 — 이는 주식 시장에 중대한 시사점을 갖습니다

BSF, 배양 가죽의 미국 시장 진출을 위한 합작 투자 발표로 주가 56% 급등

Microsoft 실적 발표의 3가지 주요 시사점

화물 시장 업데이트: 공급 능력이 부족한 5가지 신호

연방준비제도(Fed)가 56년 만에 전례 없는 조치를 취했습니다 — 이는 주식 시장에 중대한 시사점을 갖습니다