Google Cloud 자격 증명 없이 Hermes Agent에서 Google Gemini를 사용할 수 있나요?

네. RunAPI는 OpenAI 호환 엔드포인트를 통해 Gemini를 제공합니다. base_url https://runapi.ai/v1과 key_env RUNAPI_API_KEY로 RunAPI를 custom:runapi 프로바이더로 설정하세요. Google Cloud 프로젝트, 서비스 계정, Vertex AI 설정이 필요 없습니다.

에이전트 워크플로우에 어떤 Gemini 버전을 사용해야 하나요?

Gemini 3.5 Flash(gemini-3.5-flash)가 가장 최신이고 빠릅니다 — 실시간 에이전트 루프와 tool-calling 체인에 가장 적합합니다. Gemini 2.5 Pro(gemini-2.5-pro)는 긴 컨텍스트 작업과 복잡한 추론을 처리합니다. Gemini 3.x Pro 프리뷰는 더 높은 비용으로 최신 추론 기능을 제공합니다.

RunAPI에서 Gemini 가격은 어떻게 책정되나요?

Gemini는 RunAPI에서 종량제 방식으로 토큰당 청구됩니다. 입력 및 출력 토큰은 별도로 계산됩니다. 월정액 구독이나 최소 지출이 없습니다. 현재 백만 토큰당 요금은 RunAPI 가격 페이지를 확인하세요.

Hermes Agent에서 세션 중간에 Gemini와 다른 LLM을 전환할 수 있나요?

네. 모든 RunAPI LLM은 동일한 custom:runapi 프로바이더와 API key를 공유합니다. 프로바이더 설정을 변경하지 않고 /model 명령어 또는 hermes model을 사용해 gemini-3.5-flash, gpt-5.5, claude-opus-4.6 또는 다른 RunAPI 모델로 전환하세요.

RunAPI를 통한 Gemini가 function calling과 tool use를 지원하나요?

네. RunAPI는 OpenAI 호환 tools와 tool_choice 파라미터를 Gemini에 전달합니다. 요청 본문에 tools를 정의하면 Gemini가 어시스턴트 메시지에 tool_calls를 반환합니다. Hermes Agent는 GPT나 Claude의 tool call을 처리하는 것과 동일한 방식으로 처리합니다.

HERMES + GEMINI

Hermes Agent에서 Gemini를 사용하세요.

Google Gemini는 RunAPI의 OpenAI 호환 엔드포인트를 통해 사용할 수 있습니다. Hermes Agent는 custom:runapi 프로바이더를 사용해 호출합니다 — Gemini 3.5 Flash는 속도가 중요한 에이전트 루프, 3.x Pro는 다단계 추론, 2.5 Pro는 긴 컨텍스트 프로덕션 작업에 적합합니다. Google Cloud 프로젝트나 Vertex AI 자격 증명이 필요 없습니다 — 채팅에 이미 설정한 동일한 RUNAPI_API_KEY와 base_url을 사용하면 됩니다.

API Key 받기 문서 읽기

하나의 API key · OpenAI 호환 채팅 엔드포인트 · 스트리밍 지원

Use RunAPI to send a chat request to Google Gemini 3.5 Flash through Hermes Agent.

Requirements:
- Use the custom:runapi provider already configured in Hermes Agent
- Call the RunAPI chat completions endpoint at https://runapi.ai/v1/chat/completions
- Set model to "gemini-3.5-flash"
- The RUNAPI_API_KEY environment variable provides authorization
- The response is synchronous — the reply arrives in choices[0].message.content
- For streaming, set stream to true and process server-sent events

curl -X POST https://runapi.ai/v1/chat/completions \
  -H "Authorization: Bearer $RUNAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.5-flash",
    "messages": [
      {"role": "system", "content": "You are a concise technical assistant."},
      {"role": "user", "content": "Explain the difference between gRPC and REST in three sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gemini-3.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "gRPC uses HTTP/2 and Protocol Buffers for strongly-typed, multiplexed RPC calls with built-in code generation. REST uses HTTP/1.1 (or 2) with JSON payloads and relies on URL paths and HTTP verbs for resource semantics. gRPC is faster for service-to-service calls; REST is simpler to debug and more widely supported by browsers."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 34,
    "completion_tokens": 71,
    "total_tokens": 105
  }
}

curl 명령어를 복사하여 테스트하세요 gemini

작동 방식

3단계로 Hermes Agent에서 Gemini 사용하기

RunAPI 설정

RUNAPI_API_KEY 환경 변수를 설정하세요. 이미 Hermes Agent에 RunAPI를 custom:runapi 프로바이더로 추가했다면, 동일한 key와 base_url로 Gemini를 사용할 수 있습니다 — 모델 ID만 변경하면 됩니다. Google Cloud 자격 증명이 필요 없습니다.

export RUNAPI_API_KEY=runapi_xxx

chat completions를 통해 Gemini 호출

model을 gemini-3.5-flash로 설정하여 /v1/chat/completions에 POST 요청을 보내세요. system과 user 역할이 포함된 messages 배열을 전달하세요. Hermes Agent는 GPT에 사용하는 것과 동일한 OpenAI 호환 요청 형식을 전송합니다 — RunAPI가 model 파라미터를 기반으로 Gemini로 라우팅합니다.

POST /v1/chat/completions

응답 읽기

응답은 OpenAI chat completion 형식으로 동기적으로 도착합니다. 어시스턴트 답변은 choices[0].message.content에 있으며, 토큰 사용량은 usage 객체에 있습니다. 스트리밍의 경우 stream을 true로 설정하면 Hermes Agent가 SSE delta 이벤트를 자동으로 파싱합니다.

choices[0].message.content

파라미터

Gemini chat completions API 파라미터

파라미터	유형	설명
`model`	`string`	필수. gemini-3.5-flash, gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview, 또는 gemini-3.1-pro-preview.
`messages`	`array`	필수. role(system, user, assistant)과 content 필드를 가진 메시지 객체 배열.
`temperature`	`number`	선택 사항. 0에서 2 사이의 샘플링 온도. 낮은 값일수록 더 결정론적인 출력을 생성합니다. 기본값은 모델마다 다릅니다.
`max_tokens`	`integer`	선택 사항. 응답에서 생성할 최대 토큰 수.
`stream`	`boolean`	선택 사항. true로 설정하면 응답이 server-sent events로 스트리밍됩니다. 각 이벤트에는 부분 콘텐츠가 포함된 delta가 있습니다.
`top_p`	`number`	선택 사항. 0에서 1 사이의 nucleus 샘플링 임계값. 출력 무작위성 제어를 위한 temperature의 대안.

Hermes Agent의 Gemini란?

Google Gemini는 RunAPI의 custom:runapi provider를 통해 Google Cloud 자격 증명 없이 이용할 수 있습니다. Hermes Agent는 GPT와 Claude와 동일한 OpenAI 호환 설정을 사용해 호출합니다. Gemini 3.5 Flash는 속도에 민감한 agent 루프의 가장 빠른 선택이며, Gemini 2.5 Pro는 100만 토큰 컨텍스트 창과 사고 모드를 제공합니다.

Gemini 활용 사례

Live API를 활용한 실시간 음성·동영상 대화

Gemini의 멀티모달 기능을 사용해 오디오·동영상 입력(텍스트와 함께)을 실시간으로 처리하는 앱을 구축하고, Hermes Agent 워크플로로 보고 들을 수 있는 대화형 agent를 만들어냅니다.

Google 검색 데이터를 활용한 응답 접지

Hermes Agent 워크플로에서 최신 정보가 필요한 작업에 사실에 정확한 답변을 제공하고 환각을 줄이기 위해 Gemini의 검색 접지 기능을 활성화합니다.

멀티모달 문서 처리 파이프라인

단일 Hermes Agent 실행에서 혼합 콘텐츠 유형(PDF·이미지·표)이 포함된 문서를 처리해 스캔된 표에서 데이터를 추출하고, 멀티미디어 보고서를 요약하거나 동영상 콘텐츠를 분류합니다.

FAQ

Gemini + Hermes Agent 질문

Hermes Agent 일반 설정

아직 설정하지 않으셨나요? Hermes Agent용 RunAPI 설정 가이드로 시작하세요.

Hermes Agent 설정 가이드 →

Gemini 모델 카탈로그

Gemini의 모든 변형, 가격, API 문서를 확인하세요.

Gemini 모델 →

지금 Hermes Agent에서 Gemini를 사용해보세요.

무료 RunAPI key를 발급받고, custom:runapi 프로바이더에서 model을 gemini-3.5-flash로 설정한 후 Hermes Agent에서 Gemini를 사용하세요.

모델 보기 →