Reddit요약2026. 05. 05. 14:00

인텔 B70: LLama.ccp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

요약

본 기술 기사는 인텔 GPU 환경에서 LLM 추론 성능을 비교 분석한 내용을 담고 있습니다. 특히, `llama.cpp`의 새로운 OpenVino 백엔드와 기존 SYCL 방식, 그리고 Intel 기반의 LLM-Scaler를 사용하여 DeepSeek-R1-Distill-Llama-8B 모델의 성능 지표(t/s, ttfr 등)를 측정했습니다. 테스트 결과, OpenVino는 이전 최고 기록이었던 SYCL보다 성능이 향상되었으나, GPTQ/Int4 최적화가 적용된 LLM-Scaler에 비해서는 여전히 뒤처지는 경향을 보였습니다.

핵심 포인트

인텔 GPU 환경에서 LLM 추론 가속기 비교: OpenVino vs SYCL vs LLM-Scaler
OpenVino 백엔드는 기존 SYCL 방식 대비 성능 향상을 보여주었으나, 전반적인 최고 성능은 LLM-Scaler가 기록했습니다.
성능 측정 시 모델의 양자화(GPTQ/Int4) 및 하드웨어 최적화 수준이 결과에 큰 영향을 미치는 것으로 분석됩니다.
실제 사용 환경에서는 프롬프트 처리 속도(prompt processing)가 중요한 성능 지표로 작용하는 경향을 보였습니다.

원본: In case anyone is interested, I decided to test out LLama.cpp's new OpenVino backend to see how it compares on Intel GPUs. At first glance, it stomps all over the previous best-case, SYCL, but lags behind LLM-Scaler (Intel's VLLM fork), likely just due to the hardware optimizations against GPTQ/Int4. Interestingly tg512 was fastest on SYCL, but in real world, the prompt processing always seems the be the indicator on this card.

As usual with Intel, model selection is... poor. It took a while to even find a model that was in the validated OpenVino list that would not only run properly, but also have a counterpart that was "close enough" for LLM Scaler.

Edit: Really Reddit? Can't edit a title? Haven't used this heap in so long, now I'm remembering why.

## Llama.cpp OpenVino
llama-benchy http://localhost:8000/v1 bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M


| model                                              |   test |              t/s |     peak t/s |      ttfr (ms) |   est_ppt (ms) |   e2e_ttft (ms) |
|:---------------------------------------------------|-------:|-----------------:|-------------:|---------------:|---------------:|----------------:|
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | pp2048 | 3845.61 ± 524.73 |              | 659.99 ± 56.95 | 489.07 ± 56.95 |  739.42 ± 56.84 |
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M |  tg512 |     40.89 ± 0.55 | 44.33 ± 1.25 |                |                |                 |

## Llama.cpp SYCL
llama-benchy http://localhost:8000/v1 bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M


| model                                              |   test |            t/s |     peak t/s |       ttfr (ms) |    est_ppt (ms) |   e2e_ttft (ms) |
|:---------------------------------------------------|-------:|---------------:|-------------:|----------------:|----------------:|----------------:|
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | pp2048 | 844.64 ± 19.25 |              | 2199.90 ± 23.63 | 2178.96 ± 23.63 | 2229.67 ± 24.84 |
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M |  tg512 |   73.87 ± 1.17 | 78.00 ± 2.16 |                 |                 |                 |

## LLM-Scaler
llama-benchy http://localhost:8000/v1 jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4 


| model   |   test |              t/s |     peak t/s |      ttfr (ms) |   est_ppt (ms) |   e2e_ttft (ms) |
|:--------|-------:|-----------------:|-------------:|---------------:|---------------:|----------------:|
| jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4    | pp2048 | 7875.52 ± 642.20 |              | 268.09 ± 20.50 | 240.11 ± 20.50 |  268.34 ± 20.45 |
| jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4    |  tg512 |     52.75 ± 0.10 | 54.00 ± 0.00 |                |                |                 |

Llama.cpp OpenVino

AI 자동 생성 콘텐츠

원문 바로가기

인텔 B70: LLama.ccp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

요약

핵심 포인트

Llama.cpp OpenVino

댓글