arXiv논문2026. 04. 24. 22:03

트랜스포머의 추상 기호 논리 추론 능력 심층 분석

요약

본 연구는 디코더 전용 트랜스포머 모델이 컨텍스트 내에서 제시되는 명제 논리(propositional logic) 문제 해결 시, 학습 과정에서 보지 못한 변수 이름에 대한 일반화 능력을 조사합니다. 기존 연구들이 실패했던 지점을 이론적/실험적으로 분석하여, '미관찰 토큰의 임베딩 및 언임베딩 붕괴(unembedding collapse)'가 핵심 원인임을 밝혀냈습니다. 이 문제를 해결하기 위해 복사 기능 강화 아키텍처 변경, 데이터 다양성 확보, 그리고 (언)임베딩 리셋 등의 조합을 제안하며, 이를 통해 미관찰 토큰에 대한 일반화가

핵심 포인트

트랜스포머 모델의 기호 논리 추론 실패는 단순히 '토큰 복사' 문제일 뿐 아니라, 미관찰 토큰의 임베딩 및 언임베딩이 유사한 벡터로 붕괴(collapse)되는 근본적인 원인입니다.
연구진은 아키텍처 변경을 통해 복사 기능을 개선하고, 데이터 다양성 확보와 (언)임베딩 리셋 기법을 결합하여 미관찰 토큰에 대한 일반화 능력을 성공적으로 입증했습니다.
실험 결과, Gemma 3 계열의 오픈 가중치 모델에서도 사용되지 않은 예약된 토큰들의 상관관계 임베딩이 파인튜닝 초기값으로 부적절함을 확인했습니다.
트랜스포머 기반 LLM의 추상 기호 논리 추론 능력을 근본적으로 개선할 수 있는 새로운 메커니즘을 제시합니다.

We investigate the ability of decoder-only transformer models to perform abstract symbolic reasoning; specifically solving propositional logic reasoning problems given in-context. Previous work demonstrated that models fail to generalize to problems involving variable names that were not observed during training, and it was shown that one reason behind this is the difficulty of copying (or generating) unseen tokens. We show both theoretically and empirically that a particular representational collapse also has a crucial role: the unembeddings (last-layer weights) of unseen tokens collapse to nearly the same vector during training. The collapse makes distinguishing multiple unseen variables difficult for the model (especially when the embedding and unembedding parameters are shared), and provides a mechanistic explanation for the effectiveness of existing heuristic interventions like "active forgetting", which periodically reset the token (un)embeddings.

Based on these observations, we devise a combination of techniques, involving a small architecture change facilitating copying, data diversity, and freezing or resetting (un)embeddings, that achieves generalization to unseen tokens. We support our claims with extensive controlled experiments on propositional logic reasoning problems. Beyond synthetic experiments, we also observe evidence of (un)embedding collapse in the open-weight models in the Gemma 3 family, which includes 99 unused tokens reserved for downstream use. Empirically we find that the correlated embeddings of these tokens are a poor initialization for finetuning applications.

AI 자동 생성 콘텐츠

원문 바로가기

트랜스포머의 추상 기호 논리 추론 능력 심층 분석

요약

핵심 포인트

댓글