---
title: &quot;通过 RunAPI 在爱马仕 (Hermes Agent) 中使用 Gemini — 大模型API 指南&quot;
url: &quot;https://runapi.ai/zh-CN/hermes-gemini.md&quot;
canonical: &quot;https://runapi.ai/zh-CN/hermes-gemini&quot;
locale: &quot;zh-CN&quot;
model: &quot;gemini&quot;
---

# 在 Hermes Agent 中使用 Gemini。

Google Gemini 可通过 RunAPI 的 OpenAI 兼容端点调用。Hermes Agent 使用 custom:runapi provider 调用它 — Gemini 3.5 Flash 用于对速度敏感的 agent 循环，3.x Pro 用于多步推理，2.5 Pro 用于长上下文生产任务。无需 Google Cloud 项目或 Vertex AI 凭据 — 只需你已为聊天配置的同一个 RUNAPI_API_KEY 和 base_url。

## API example

```bash
curl -X POST https://runapi.ai/v1/chat/completions \
  -H &quot;Authorization: Bearer $RUNAPI_API_KEY&quot; \
  -H &quot;Content-Type: application/json&quot; \
  -d &#39;{
    &quot;model&quot;: &quot;gemini-3.5-flash&quot;,
    &quot;messages&quot;: [
      {&quot;role&quot;: &quot;system&quot;, &quot;content&quot;: &quot;You are a concise technical assistant.&quot;},
      {&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Explain the difference between gRPC and REST in three sentences.&quot;}
    ],
    &quot;temperature&quot;: 0.7,
    &quot;max_tokens&quot;: 256
  }&#39;

```

### Response

```json
{
  &quot;id&quot;: &quot;chatcmpl-abc123&quot;,
  &quot;object&quot;: &quot;chat.completion&quot;,
  &quot;model&quot;: &quot;gemini-3.5-flash&quot;,
  &quot;choices&quot;: [
    {
      &quot;index&quot;: 0,
      &quot;message&quot;: {
        &quot;role&quot;: &quot;assistant&quot;,
        &quot;content&quot;: &quot;gRPC uses HTTP/2 and Protocol Buffers for strongly-typed, multiplexed RPC calls with built-in code generation. REST uses HTTP/1.1 (or 2) with JSON payloads and relies on URL paths and HTTP verbs for resource semantics. gRPC is faster for service-to-service calls; REST is simpler to debug and more widely supported by browsers.&quot;
      },
      &quot;finish_reason&quot;: &quot;stop&quot;
    }
  ],
  &quot;usage&quot;: {
    &quot;prompt_tokens&quot;: 34,
    &quot;completion_tokens&quot;: 71,
    &quot;total_tokens&quot;: 105
  }
}

```

## How it works

1. **Configure RunAPI** — Set the RUNAPI_API_KEY environment variable. If you already added RunAPI as a custom:runapi provider in Hermes Agent, the same key and base_url work for Gemini — change only the model ID. No Google Cloud credentials needed.
2. **Call Gemini via chat completions** — Send a POST request to /v1/chat/completions with model set to gemini-3.5-flash. Pass a messages array with system and user roles. Hermes Agent sends the same OpenAI-compatible request shape it uses for GPT — RunAPI routes to Gemini based on the model parameter.
3. **Read the response** — The response arrives synchronously in OpenAI chat completion format. The assistant reply is in choices[0].message.content, with token usage in the usage object. For streaming, set stream to true and Hermes Agent parses the SSE delta events automatically.

## Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | `string` | Required. gemini-3.5-flash, gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview, or gemini-3.1-pro-preview. |
| `messages` | `array` | Required. Array of message objects with role (system, user, assistant) and content fields. |
| `temperature` | `number` | Optional. Sampling temperature between 0 and 2. Lower values produce more deterministic output. Default varies by model. |
| `max_tokens` | `integer` | Optional. Maximum number of tokens to generate in the response. |
| `stream` | `boolean` | Optional. When true, the response streams as server-sent events. Each event contains a delta with partial content. |
| `top_p` | `number` | Optional. Nucleus sampling threshold between 0 and 1. Alternative to temperature for controlling output randomness. |

## FAQ

### Can I use Google Gemini in Hermes Agent without Google Cloud credentials?

Yes. RunAPI provides Gemini through its OpenAI-compatible endpoint. Configure RunAPI as a custom:runapi provider with base_url https://runapi.ai/v1 and key_env RUNAPI_API_KEY. No Google Cloud project, service account, or Vertex AI setup required.

### What is the difference between Gemini Flash vs Pro -- when should I use each?

Flash (gemini-3.5-flash) is fastest and cheapest -- best for real-time agent loops, classification, and tool-calling chains. Pro (gemini-2.5-pro) handles complex reasoning, long-context analysis, and multi-step tasks. Use Flash for speed, Pro for depth.

### How do I use context caching to reduce costs with long documents?

When sending the same large context across multiple requests (like a codebase or document set), Gemini&#39;s context caching reduces input token costs on subsequent calls. This is especially useful in agent loops where the system prompt and reference material stay the same across many turns.

### Can Hermes Agent switch between Gemini and other LLMs mid-session?

Yes. All RunAPI LLMs share the same custom:runapi provider and API key. Use the /model command or hermes model to switch between gemini-3.5-flash, gpt-5.5, claude-opus-4.6, or any other RunAPI model without changing provider config.

### Does Gemini through RunAPI support function calling and tool use?

Yes. RunAPI passes the OpenAI-compatible tools and tool_choice parameters to Gemini. Define tools in the request body and Gemini returns tool_calls in the assistant message. Hermes Agent processes these the same way it handles tool calls from GPT or Claude.

### Can Hermes Agent mix Gemini with other LLM providers in one workflow?

Yes. Hermes Agent can call Gemini Flash for cheap preprocessing, GPT-5.5 for complex reasoning, and Claude for long-context analysis, all through the same RunAPI key and custom:runapi provider.


## Links

- [Hermes Agent 配置指南 →](https://runapi.ai/zh-CN/hermes-agent)
- [Gemini 模型 →](https://runapi.ai/zh-CN/models/gemini)
- [Model catalog](https://runapi.ai/zh-CN/models)
- [API docs](https://runapi.ai/zh-CN/docs)
