Gemini를 활용한 지식 그래프 (Knowledge Graphs) 구축

✨ 개요 (Overview)

이번 탐색에서는 Gemini를 사용하여 가공되지 않은 비정형 문서(unstructured documents)를 구조화된 지식 그래프 (Knowledge Graphs)로 변환하는 방법을 살펴보겠습니다. 먼저 프로토타이핑을 통해 직관을 개발하는 것부터 시작합니다. 그 다음, 프롬프트 (prompts)와 출력값 (outputs)을 최적화하고, 마지막으로 책 전체나 복잡한 법률 계약서를 처리할 수 있도록 규모를 확장할 것입니다. 마지막에는 추출된 도서 내러티브와 계약 네트워크 그래프를 시각화하는 과정까지 다룰 것입니다!

시작하기 전에 몇 가지 참고 사항:

저는 Google Cloud의 소프트웨어 엔지니어이자 개발자 어드보케이트 (developer advocate)이며, 여러분이 몇 가지를 배울 수 있기를 바랍니다. 생각과 의견은 전적으로 저 개인의 것입니다.
전체 소스 코드는 Apache 2.0 라이선스 하에 이 노트북 (설정 세부 사항 및 향후 업데이트 포함)에서 확인할 수 있습니다. 또한 Colab에서 노트북을 직접 열 수도 있습니다. 이 기사는 “모두 실행 (Run all)”을 클릭하여 생성된 모든 결과를 재현합니다.
다음 플랫폼에서 Gemini를 사용하여 무료로 실험하고 구축할 수 있습니다:
- Google AI Studio (Gemini를 프로그래밍 방식으로 호출하기 위한 API 키 획득)
- Agent Studio (300달러의 무료 크레딧으로 Google Cloud에서 구축 시작)

🔥 도전 과제 (Challenge)

문서는 어디에나 있습니다. 우리는 비즈니스, 일상 운영, 법률 문제, 기술 문서, 교육, 심지어 단순히 재미를 위해서도 문서를 사용합니다. 하지만 문서는 데이터베이스 (databases)가 아닙니다. 문서들은 일반적으로 비정형적이며, 이를 완전히 이해하려면 여러 번 읽는 과정이 필요합니다.

그렇다면, 오직 다음만을 사용하여 문서에서 구조화된 지식을 추출할 수 있을까요?

1개의 문서
1개의 프롬프트 (prompt)
1개의 요청 (request)

Gemini와 함께 시도해 봅시다...

🏁 설정 (Setup)

🐍 Python 패키지

우리는 다음 패키지들을 사용할 것입니다:

Google Gen AI Python SDK를 사용하여 Gemini를 호출하기 위한 google-genai
그래프 관리(graph management)를 위한 networkx

또한 다음 패키지들이 필요합니다:

요청 관리(request management)를 위한 tenacity (google-genai의 의존성)
데이터 시각화(data visualization)를 위한 matplotlib 및 pillow (networkx의 의존성)

%pip install --quiet "google-genai>=2.6.0" "networkx[default]"

🤝 Gemini API

Gemini API를 사용하기 위해서는 두 가지 주요 옵션이 있습니다:

Google Cloud 프로젝트를 통한 Agent Platform (이전의 Vertex AI) 이용
Gemini API 키를 통한 Google AI Studio 이용

Google Gen AI SDK는 이러한 API들에 대한 통합된 인터페이스를 제공하며, 설정을 위해 환경 변수(environment variables)를 사용할 수 있습니다. 🔽

🛠️ 옵션 1 - Agent Platform을 통한 Gemini API

요구 사항:

Google Cloud 프로젝트
해당 프로젝트에 대해 Agent Platform API가 활성화되어 있어야 합니다: ▶️ Agent Platform API 활성화하기

Gen AI SDK 환경 변수:

GOOGLE_GENAI_USE_ENTERPRISE="True"
GOOGLE_CLOUD_PROJECT="<PROJECT_ID>"
GOOGLE_CLOUD_LOCATION="<LOCATION>"

💡 프리뷰 모델(preview models)의 경우, 위치(location)를 global로 설정해야 합니다. 일반적으로 사용 가능한 모델(generally available models)의 경우, Google 모델 엔드포인트 위치 중에서 가장 가까운 위치를 선택할 수 있습니다.

ℹ️ 프로젝트 및 개발 환경 설정에 대해 자세히 알아보세요.

🛠️ 옵션 2 - Google AI Studio를 통한 Gemini API

요구 사항:

Gemini API 키

Gen AI SDK 환경 변수:

GOOGLE_GENAI_USE_ENTERPRISE="False"
GOOGLE_API_KEY="<API_KEY>"

ℹ️ Google AI Studio에서 Gemini API 키를 받는 방법에 대해 자세히 알아보세요.

💡 환경 설정(environment configuration)을 소스 코드 외부에서 저장할 수 있습니다:

환경 (Environment)	방법 (Method)
IDE	`.env` 파일 (또는 그에 상응하는 방식)
...

다음의 환경 감지 (environment detection) 함수들을 정의합니다. 필요한 경우 설정을 수동으로 정의할 수도 있습니다. 🔽

import os
import sys
from collections.abc import Callable
...

✅ Environment functions defined

🤖 Gen AI SDK

Gemini 요청을 보내기 위해 google.genai 클라이언트를 사용합니다:

from google import genai

check_environment()
...

✅ Using the Agent Platform API with project "lpdemo-..." in location "global"

🔣 입력 데이터 (Input data)

솔루션을 개발하기 위해 일련의 테스트 데이터가 필요합니다.

멀티모달리티 (Multimodality)

다음과 같은 유형들을 테스트할 것입니다:

텍스트 (text/plain): 고전 서적은 다양한 길이와 언어를 가진 좋은 텍스트 소스입니다.
PDF (application/pdf): 법률 계약서는 복잡하고 밀도 높은 문서의 훌륭한 예시입니다.

Gemini는 네이티브 멀티모달 (natively multimodal) 모델로, 이는 다양한 유형의 입력을 처리할 수 있음을 의미합니다. 텍스트나 PDF 입력을 통해 지식 그래프 (knowledge graphs)를 구축하고 나면, 이 솔루션은 자연스럽게 다음 형식들도 지원하게 됩니다:

이미지 (image/*)
오디오 (audio/*)
비디오 (video/*)

일반 지식 (General knowledge)

⚠️ LLM (대규모 언어 모델)은 일반 지식으로 학습되며, 이는 모델의 "장기 기억 (long-term memory)"의 일부가 됩니다. 암기된 정보가 생성되는 것을 방지하기 위해, 제공된 입력값만을 사용하도록 모델에 명시적으로 지시할 것입니다.

다국어 지원 (Multilinguality)

Gemini는 또한 네이티브 다국어 (natively multilingual) 모델로, 100개 이상의 언어로 입력을 처리하고 출력을 생성할 수 있습니다.

범용성을 유지하기 위해 프롬프트 (prompts)와 지식 그래프에는 영어를 사용하겠지만, 프롬프트가 명확하고 명시적이기만 하다면 지원되는 100개 이상의 언어 중 무엇이든 사용할 수 있습니다.

몇 가지 데이터 소스와 헬퍼 (helpers)를 정의해 보겠습니다: 🔽

import mimetypes
from collections.abc import Iterator
from enum import Enum
...

✅ Data helpers defined

🧠 Gemini model

Gemini는 다양한 버전과 크기 (Flash-Lite, Flash, Pro)로 제공됩니다.

높은 성능, 낮은 지연 시간 (low latency), 그리고 매우 빠른 출력 속도를 제공하는 Gemini 3.1 Flash-Lite로 시작해 보겠습니다:

GEMINI_3_1_FLASH_LITE = "gemini-3.1-flash-lite"

⚙️ Gemini 설정 (configuration)

Gemini는 사실적인 모드부터 창의적인 모드까지 다양한 방식으로 사용될 수 있습니다. 우리는 본질적으로 **데이터 추출 유스케이스 (data-extraction use case)**를 다루고 있습니다. 우리는 결과가 가능한 한 사실적이고 결정론적 (deterministic)이기를 원합니다. 이를 달성하기 위해 콘텐츠 생성 파라미터 (content generation parameters)를 조정할 수 있습니다.

무작위성 (randomness)을 최소화하기 위해 temperature, top_p, 그리고 seed 파라미터를 다음과 같이 설정하겠습니다:

temperature=0.0
top_p=0.0
seed=42 (임의의 고정 값)

🛠️ 헬퍼 (Helpers)

이제 핵심 헬퍼 클래스와 함수를 추가해 보겠습니다: 🔽

from enum import StrEnum, auto

import IPython.display
import tenacity
from google.genai.errors import ClientError
from google.genai.types import (
    FinishReason,
    GenerateContentConfig,
    GenerateContentResponse,
    ThinkingConfig,
    ThinkingLevel,
)

class Model(Enum):
    GEMINI_3_1_FLASH_LITE = "gemini-3.1-flash-lite"
    GEMINI_3_5_FLASH = "gemini-3.5-flash"
    GEMINI_2_5_FLASH = "gemini-2.5-flash"
    GEMINI_2_5_PRO = "gemini-2.5-pro"
    # Preview
    GEMINI_3_1_PRO = "gemini-3.1-pro-preview"
    # Default model
    DEFAULT = GEMINI_3_1_FLASH_LITE

# 더 결정론적인 출력을 위한 기본 설정
DEFAULT_CONFIG = GenerateContentConfig(
    temperature=0.0,
    top_p=0.0,
    seed=42,  # 임의의 고정 값
)

class ShowAs(StrEnum):
    DONT_SHOW = auto()
    TEXT = auto()
    MARKDOWN = auto()

def generate_content(
prompt: str,
source: Source | str | None = None,
*,
model: Model | None = None,
config: GenerateContentConfig | None = None,
system_instruction: str | None = None,
show_prompt: ShowAs = ShowAs.DONT_SHOW,
show_response: ShowAs = ShowAs.MARKDOWN,
only_show_prompt: bool = False,
return_response: bool = False,
) -> GenerateContentResponse | None:
disable_colab_cell_scrollbar()

model = model or Model.DEFAULT
model_id = model.value
prompt_contents = get_prompt_contents(prompt, source, show_prompt, only_show_prompt)
if only_show_prompt:
    return None
config = config or get_generate_content_config(model, system_instruction)
client = check_client_for_model(model)

response = None
display_request_header(model_id, source)
for attempt in get_retrier():
    with attempt:
        response = client.models.generate_content(
            model=model_id,
            contents=prompt_contents,  # type: ignore
            config=config,
        )
        display_response_info(response)
        display_response(response, show_response)

return response if return_response else None

def get_prompt_contents(
prompt: str,
source: Source | str | None,
show_prompt: ShowAs,
only_show_prompt: bool,
) -> list[str | Part]:
def yield_prompt_contents() -> Iterator[str | Part]:
if not source:
yield prompt.strip()
return
yield "==Start of input data==\n"
if isinstance(source, str):
yield f"{source.strip()}\n"
else:
yield from source.yield_contents()
yield "==End of input data==\n"
yield f"==Start of user prompt==\n{prompt.strip()}\n==End of user prompt=="

prompt_contents = list(yield_prompt_contents())
display_prompt(prompt_contents, show_prompt, only_show_prompt)

return prompt_contents

def get_generate_content_config(
model: Model,
system_instruction: str | None = None,
) -> GenerateContentConfig:
thinking_config = get_thinking_config_for_model(model)

return GenerateContentConfig(
    system_instruction=system_instruction,
    temperature=DEFAULT_CONFIG.temperature,
    top_p=DEFAULT_CONFIG.top_p,
    seed=DEFAULT_CONFIG.seed,
    thinking_config=thinking_config,
)

def get_thinking_config_for_model(model: Model) -> ThinkingConfig | None:
# Use minimal