๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

ยฉ 2026 Molayo

HuggingFaceํ—ค๋“œ๋ผ์ธ2026. 05. 07. 02:34

๐Ÿ‡ต๐Ÿ‡ญ FilBench - LLM ์ด ํƒ€๊ฐˆ๋กœ๊ทธ (Tagalog), ํ•„๋ฆฌํ•€์–ด (Filipino) ๋ฐ ์„ธ๋ถ€์•„๋…ธ (Cebuano) ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ• 

์š”์•ฝ

FilBench๋Š” ํƒ€๊ฐˆ๋กœ๊ทธ(Tagalog), ํ•„๋ฆฌํ•€์–ด(Filipino), ์„ธ๋ถ€์•„๋…ธ(Cebuano) ๋“ฑ ํ•„๋ฆฌํ•€ ์–ธ์–ด์— ๋Œ€ํ•œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์„ฑ๋Šฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ ํฌ๊ด„์ ์ธ ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค. ์ด ๋ฒค์น˜๋งˆํฌ๋Š” ๋ฌธํ™”์  ์ง€์‹, ๊ณ ์ „ NLP, ๋…ํ•ด ์ดํ•ด, ์ƒ์„ฑ ๋„ค ๊ฐ€์ง€ ์ฃผ์š” ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ ์˜์—ญ์—์„œ LLM์˜ ๊นŠ์ด ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด, ๋™๋‚จ์•„์‹œ์•„ ํŠนํ™”(SEA-specific) ๋ชจ๋ธ๋“ค์ด ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜, ์—ฌ์ „ํžˆ GPT-4o์™€ ๊ฐ™์€ ์ตœ์‹  ํด๋กœ์ฆˆ๋“œ์†Œ์Šค ๋ชจ๋ธ์—๋Š” ๋ฏธ์น˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ FilBench๋Š” ํ•„๋ฆฌํ•€์–ด/SEA ํŠนํ™” ๋ฐ์ดํ„ฐ๋ฅผ ์ง€์†์ ์œผ๋กœ ํ๋ ˆ์ด์…˜ํ•˜๊ณ  ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋…ธ๋ ฅ์ด ์—ฌ์ „ํžˆ ๋งค์šฐ ์ค‘์š”ํ•˜๋ฉฐ, ์ด ๋ถ„์•ผ์˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํฌ์ธํŠธ

  • FilBench๋Š” ํƒ€๊ฐˆ๋กœ๊ทธ, ์„ธ๋ถ€์•„๋…ธ ๋“ฑ ํ•„๋ฆฌํ•€ ์–ธ์–ด์— ๋Œ€ํ•œ LLM ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ํฌ๊ด„์ ์ธ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.
  • ํ‰๊ฐ€๋Š” ๋ฌธํ™”์  ์ง€์‹, ๊ณ ์ „ NLP, ๋…ํ•ด ์ดํ•ด, ์ƒ์„ฑ์˜ 4๊ฐ€์ง€ ํ•ต์‹ฌ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๊ตฌ์„ฑ๋˜์–ด ๊นŠ์ด ์žˆ๋Š” ๋ถ„์„์„ ์ œ๊ณตํ•œ๋‹ค.
  • SEA-specific ๋ชจ๋ธ๋“ค์ด ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜, ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์€ ์—ฌ์ „ํžˆ ์ตœ์‹  ํด๋กœ์ฆˆ๋“œ์†Œ์Šค LLM์— ์˜์กดํ•˜๊ณ  ์žˆ๋‹ค.
  • FilBench๋Š” ํ•„๋ฆฌํ•€์–ด/SEA ํŠนํ™” ๋ฐ์ดํ„ฐ๋ฅผ ํ๋ ˆ์ด์…˜ํ•˜์—ฌ ๊ธฐ๋ณธ LLM์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋…ธ๋ ฅ์ด ์ง€์†์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋งค์šฐ ์ค‘์š”ํ•จ์„ ๊ฐ•์กฐํ•œ๋‹ค.

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ํƒ€๊ฐˆ๋กœ๊ทธ (Tagalog), ํ•„๋ฆฌํ•€์–ด (Filipino: ํ‘œ์ค€ ํƒ€๊ฐˆ๋กœ๊ทธ), ๊ทธ๋ฆฌ๊ณ  ์„ธ๋ถ€์•„๋…ธ (Cebuano) ์˜ ์œ ์ฐฝ์„ฑ, ์–ธ์–ดํ•™์  ๋ฐ ๋ฒˆ์—ญ ๋Šฅ๋ ฅ, ๊ทธ๋ฆฌ๊ณ  ํŠน์ • ๋ฌธํ™”์  ์ง€์‹์— ๋Œ€ํ•œ LLM ์˜ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ํฌ๊ด„์ ์ธ ํ‰๊ฐ€ suites ๋ฅผ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” FilBench ์—์„œ 20 ๊ฐœ ์ด์ƒ์˜ ์ตœ์‹  ์ƒํƒœ-of-the-art LLM ์„ ํ‰๊ฐ€ํ•˜์—ฌ ํ•„๋ฆฌํ•€์–ด์—์„œ์˜ ์„ฑ๋Šฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค:

FilBench ํ‰๊ฐ€ suites ๋Š” ๋ฌธํ™”์  ์ง€์‹ (Cultural Knowledge), ๊ณ ์ „ NLP, ๋…ํ•ด ์ดํ•ด (Reading Comprehension), ์ƒ์„ฑ (Generation) ์ด๋ผ๋Š” 4 ๊ฐœ์˜ ์ฃผ์š” ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๋‚˜๋‰˜์–ด ์žˆ์œผ๋ฉฐ 12 ๊ฐœ์˜ ์ž‘์—…์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณ ์ „ NLP ์นดํ…Œ๊ณ ๋ฆฌ์—๋Š” ๊ฐ์„ฑ ๋ถ„์„๊ณผ ๊ฐ™์€ ์ž‘์—…์ด ํฌํ•จ๋˜๋ฉฐ, ์ƒ์„ฑ ์ž‘์—…์—๋Š” ๋ฒˆ์—ญ์˜ ๋‹ค์–‘ํ•œ ์ธก๋ฉด์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ NLP ์—ฐ๊ตฌ ๋ฐ ์‚ฌ์šฉ์˜ ์šฐ์„ ์ˆœ์œ„์™€ ๊ฒฝํ–ฅ์„ ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” 2006 ๋…„๋ถ€ํ„ฐ 2024 ๋…„ ์ดˆ๊นŒ์ง€ ํ•„๋ฆฌํ•€์–ด์— ๋Œ€ํ•œ NLP ์—ฐ๊ตฌ์˜ ์—ญ์‚ฌ์  ์กฐ์‚ฌ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ด๋ฅผ ์„ ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. (์ด๋Ÿฌํ•œ ์นดํ…Œ๊ณ ๋ฆฌ๋“ค์€ ๋Œ€๋ถ€๋ถ„ ํ•„๋ฆฌํ•€์–ด์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์‚ฌ์šฉ์„ ์ถฉ์‹คํ•˜๊ฒŒ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋ฒˆ์—ญ๋˜์ง€ ์•Š์€ ๋‚ด์šฉ๋งŒ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.)

๋ฌธํ™”์  ์ง€์‹ (Cultural Knowledge): ์ด ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์–ธ์–ด ๋ชจ๋ธ์˜ ์‚ฌ์‹ค์ ์ด๊ณ  ๋ฌธํ™”์ ์œผ๋กœ ํŠนํ™”๋œ ์ •๋ณด๋ฅผ ํšŒ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. ๋ฌธํ™”์  ์ง€์‹์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” LLM ์˜ ์ง€์—ญ์  ๋ฐ ์‚ฌ์‹ค์  ์ง€์‹ (Global-MMLU), ํ•„๋ฆฌํ•€ ์ค‘์‹ฌ ๊ฐ€์น˜ (KALAHI), ๊ทธ๋ฆฌ๊ณ  ๋‹จ์–ด ์˜๋ฏธ ๋ถ„๋ณ„ ๋Šฅ๋ ฅ (StingrayBench) ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋‹ค์–‘ํ•œ ์˜ˆ์‹œ๋ฅผ ์„ ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ณ ์ „ NLP: ์ด ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์ „๋ฌธํ™”๋œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์ด ์ „ํ†ต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ–ˆ๋˜ ์ •๋ณด ์ถ”์ถœ ๋ฐ ์–ธ์–ดํ•™์  ์ž‘์—…์˜ ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ช…์‚ฌ ์ธ์‹, ๊ฐ์„ฑ ๋ถ„์„, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋“ฑ์ž…๋‹ˆ๋‹ค. ์ด ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ๋Š” ๋ช…์‚ฌ ์ธ์‹์„ ์œ„ํ•ด CebuaNER, TLUnified-NER, Universal NER ์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ํฌํ•จํ•˜๋ฉฐ, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋ฐ ๊ฐ์„ฑ ๋ถ„์„์„ ์œ„ํ•ด SIB-200 ๊ณผ BalitaNLP ์˜ ์„œ๋ธŒ์…‹์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

๋…ํ•ด ์ดํ•ด (Reading Comprehension): ์ด ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ•„๋ฆฌํ•€์–ด ํ…์ŠคํŠธ๋ฅผ ์ดํ•ดํ•˜๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋Š” ์–ธ์–ด ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ๋…ํ•ด๋ ฅ, ์ดํ•ด, ์ž์—ฐ์–ด ์ถ”๋ก ๊ณผ ๊ฐ™์€ ์ž‘์—…์— ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ์ด ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ๋Š” Cebuano Readability Corpus, Belebele, NewsPH NLI ์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

์ƒ์„ฑ (Generation): ์šฐ๋ฆฌ๋Š” LLM ์ด ํ…์ŠคํŠธ๋ฅผ ์ถฉ์‹คํ•˜๊ฒŒ ๋ฒˆ์—ญํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•ด FilBench ์˜ ์ƒ๋‹น ๋ถ€๋ถ„์„ ํ• ์• ํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜์–ด์—์„œ ํ•„๋ฆฌํ•€์–ด๋กœ ๋˜๋Š” ์„ธ๋ถ€์•„๋…ธ์—์„œ ์˜์–ด๋กœ ๋ฒˆ์—ญํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฌธ์„œ (NTREX-128), ์ž์›๋ด‰์‚ฌ์ž์˜ ํ˜„์‹ค์ ์ธ ํ…์ŠคํŠธ (Tatoeba), ๋„๋ฉ”์ธ ํŠนํ™” ํ…์ŠคํŠธ (TICO-19) ๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ํ…Œ์ŠคํŠธ ์˜ˆ์‹œ ์ง‘ํ•ฉ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์ง‘๊ณ„ ์ง€ํ‘œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์ผ ๋Œ€ํ‘œ ์ ์ˆ˜๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์˜ˆ์‹œ ์ˆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ฐ€์ค‘ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•˜๋ฉฐ ์ด๋ฅผ FilBench ์ ์ˆ˜๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋ฒ•์„ ๋‹จ์ˆœํ™”ํ•˜๊ณ  ์„ค์ •์„ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” LLM ํ‰๊ฐ€์šฉ ๋ชจ๋“  ๊ธฐ๋Šฅ ํ”„๋ ˆ์ž„์›Œํฌ์ธ Lighteval ์„ ๊ธฐ๋ฐ˜์œผ๋กœ FilBench ๋ฅผ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค.
์–ธ์–ด ํŠนํ™” ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ํ‰๊ฐ€์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์šฉ์–ด (์˜ˆ: "yes" (oo), "no" (hindi), "true" (totoo) ๋“ฑ) ์—์„œ ์˜์–ด์—์„œ ํƒ€๊ฐˆ๋กœ๊ทธ (Tagalog) ๋˜๋Š” ์„ธ๋ถ€์•„๋…ธ๋กœ ๋ฒˆ์—ญ ์Œ์„ ๋จผ์ € ์ •์˜ํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ ๋‹ค์Œ, ์šฐ๋ฆฌ๊ฐ€ ๊ด€์‹ฌ ์žˆ๋Š” ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ์ปค์Šคํ…€ ์ž‘์—…์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์ œ๊ณต๋œ ํ…œํ”Œ๋ฆฟ์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

FilBench ๋Š” ์ด์ œ ๊ณต์‹ Lighteval ์ €์žฅ์†Œ์—์„œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ž‘์—…์œผ๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค!

์—ฌ๋Ÿฌ LLM ์„ FilBench ์—์„œ ํ‰๊ฐ€ํ•จ์œผ๋กœ์จ ์šฐ๋ฆฌ๋Š” ํ•„๋ฆฌํ•€์–ด์—์„œ์˜ ์„ฑ๋Šฅ์— ๋Œ€ํ•ด ๋ช‡ ๊ฐ€์ง€ ํ†ต์ฐฐ๋ ฅ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค.

Finding #1: Although region-specific LLMs still lag behind GPT-4, collecting data to train these models is still a promising direction

์ตœ๊ทผ ๋ช‡ ๋…„๊ฐ„ ๋™๋‚จ์•„ ์–ธ์–ด (SEA-specific) ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•˜๋Š” ์ง€์—ญ๋ณ„ LLM ์ด ์ฆ๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. SEA-LION ๊ณผ SeaLLM ๊ฐ™์€ ๋ชจ๋ธ์ด ์ด์— ํ•ด๋‹นํ•˜๋ฉฐ, ์ด๋“ค์€ HuggingFace ์—์„œ ์ž์œ ๋กญ๊ฒŒ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋Š” ์˜คํ”ˆ_WEIGHT LLM ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” SEA-specific LLM ์ด ์šฐ๋ฆฌ ์–ธ์–ด์— ๋Œ€ํ•ด ๊ฐ€์žฅ ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์ ์ด๋ฉฐ, ๋‹ค๋ฅธ ๋ชจ๋ธ ๋Œ€๋น„ FilBench ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’๋‹ค๊ณ  ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ฐ€์žฅ ์ข‹์€ SEA-specific ๋ชจ๋ธ์€ ์—ฌ์ „ํžˆ GPT-4o ๊ฐ™์€ ํด๋กœ์ฆˆ๋“œ์†Œ์Šค LLM ์„ ๋Šฅ๊ฐ€ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค.

์ง€์—ญ๋ณ„ LLM ๊ตฌ์ถ•์ด ์—ฌ์ „ํžˆ ์˜๋ฏธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” SEA-specific instruction-tuning ๋ฐ์ดํ„ฐ๋กœ ๊ธฐ๋ณธ LLM ์„ ์ง€์†์ ์œผ๋กœ Fine-tuning ํ•  ๋•Œ 2-3% ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ด€์ฐฐํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
์ด๋Š” ํ•„๋ฆฌํ•€์–ด/SEA-specific Training ๋ฐ์ดํ„ฐ๋ฅผ Curate ํ•˜๋Š” ๋…ธ๋ ฅ์ด ์—ฌ์ „ํžˆ ๊ด€๋ จ์„ฑ์ด ๋†’์œผ๋ฉฐ, FilBench ์—์„œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ FilBench ์˜ ๋„ค ๊ฐ€์ง€ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์ด Generation Capability ์— ์–ด๋ ค์›€์„ ๊ฒช๋Š”๋‹ค๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ–ˆ์Šต๋‹ˆ๋‹ค. Generation ์˜ Failure Mode ๋ฅผ inspect ํ•  ๋•Œ, ์ด๋Š” ๋ชจ๋ธ์ด ๋ฒˆ์—ญ ์ง€์‹œ๋ฅผ ๋”ฐ๋ฅด์ง€ ๋ชปํ•˜๊ฑฐ๋‚˜, ์ง€๋‚˜์น˜๊ฒŒ ๊ธธ๊ณ  ์ƒ์„ธํ•œ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜, Tagalog ๋˜๋Š” Cebuano ๋Œ€์‹  ๋‹ค๋ฅธ ์–ธ์–ด๋ฅผ Hallucinate ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

ํ•„๋ฆฌํ•€์€ ์ธํ„ฐ๋„ท ์ธํ”„๋ผ์™€ ํ‰๊ท  ์†Œ๋“์ด ์ œํ•œ์ ์ž…๋‹ˆ๋‹ค [3]. ๋”ฐ๋ผ์„œ ๋น„์šฉ๊ณผ ์ปดํ“จํŒ… ํšจ์œจ์ ์ธ LLM ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. FilBench ๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” ํšจ์œจ์„ฑ์˜ Pareto Frontier ์— ์žˆ๋Š” LLM ์„ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ, ์˜คํ”ˆ_WEIGHT LLM (HuggingFace ์—์„œ ์ž์œ ๋กญ๊ฒŒ ๋‹ค์šด๋กœ๋“œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ) ์€ ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ์ƒ์—…์  ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค. ํ•„๋ฆฌํ•€์–ด ์–ธ์–ด ์ž‘์—…์— ๋Œ€ํ•œ GPT-4o ์˜ ๋Œ€์•ˆ์„ ์›ํ•˜์‹ ๋‹ค๋ฉด Llama 4 Maverick ์„ ์‹œ๋„ํ•ด ๋ณด์„ธ์š”!

์šฐ๋ฆฌ๋Š” ์ด ์ •๋ณด๋ฅผ FilBench ๋ฆฌ๋”๋ณด๋“œ HuggingFace ๊ณต๊ฐ„์—์„œ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

FilBench ๊ฐ€ ํ•„๋ฆฌํ•€ ์–ธ์–ด์˜ LLM Capability ์— ๋Œ€ํ•œ ๋” ๊นŠ์€ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜๊ณ , ํ•„๋ฆฌํ•€ NLP ์—ฐ๊ตฌ ๋ฐ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ์ด‰๋งค์ œ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. FilBench ํ‰๊ฐ€ ์„ธํŠธ๋Š” Hugging Face ์˜ lighteval ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋˜์–ด, LLM ๊ฐœ๋ฐœ์ž๊ฐ€ ๋ฒค์น˜๋งˆํฌ์—์„œ ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์ •๋ณด๋Š” ์•„๋ž˜ ๋งํฌ๋ฅผ ๋ฐฉ๋ฌธํ•˜์„ธ์š”:

์ €์ž๋“ค์€ Cohere Labs ๊ฐ€ Aya ๋ชจ๋ธ ์‹œ๋ฆฌ์ฆˆ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด Cohere Research Grant ๋ฅผ ํ†ตํ•ด ํฌ๋ ˆ๋”ง์„ ์ œ๊ณตํ•ด ์ฃผ์‹  ์ , ๊ทธ๋ฆฌ๊ณ  Together AI ๊ฐ€ ์—ฌ๋Ÿฌ ์˜คํ”ˆ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€ ์ปดํ“จํŒ… ํฌ๋ ˆ๋”ง์„ ์ œ๊ณตํ•ด ์ฃผ์‹  ์ ์„ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ๋ฅผ ๋ฐœํ–‰ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค€ Hugging Face ํŒ€, ํŠนํžˆ OpenEvals ํŒ€ (Clรฉmentine Fourrier ์™€ Nathan Habib) ๊ณผ Daniel van Strien ์„ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค.

FilBench ์—์„œ ํ‰๊ฐ€ํ•˜์‹ ๋‹ค๋ฉด, ์ €์ž๋“ค์˜ ์ž‘์—…์„ ์ธ์šฉํ•ด ์ฃผ์„ธ์š”:

@article{filbench,
title={Fil{B}ench: {C}an {LLM}s {U}nderstand and {G}enerate {F}ilipino?},
author={Miranda, Lester James V and Aco, Elyanah and Manuel, Conner and Cruz, Jan Christian Blaise and Imperial, Joseph Marvin},
...

์ด ๋ฒˆ์—ญ๋ณธ์€ ์›๋ฌธ์˜ ์ •๋ณด๋Ÿ‰์„ 100% ๋ณด์กดํ•˜๋ฉฐ, ์ „๋ฌธ์šฉ์–ด๋Š” ์˜๋ฌธ ๋ณ‘๊ธฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์›๋ฌธ ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค.

AI ์ž๋™ ์ƒ์„ฑ ์ฝ˜ํ…์ธ 

๋ณธ ์ฝ˜ํ…์ธ ๋Š” Hugging Face Blog์˜ ์›๋ฌธ์„ AI๊ฐ€ ์ž๋™์œผ๋กœ ์š”์•ฝยท๋ฒˆ์—ญยท๋ถ„์„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์› ์ €์ž‘๊ถŒ์€ ์›์ €์ž‘์ž์—๊ฒŒ ์žˆ์œผ๋ฉฐ, ์ •ํ™•ํ•œ ๋‚ด์šฉ์€ ๋ฐ˜๋“œ์‹œ ์›๋ฌธ์„ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.

์›๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1

๋Œ“๊ธ€

0