在 Hermes Agent 中使用 Gemini。
Google Gemini 可通过 RunAPI 的 OpenAI 兼容端点调用。Hermes Agent 使用 custom:runapi provider 调用它 — Gemini 3.5 Flash 用于对速度敏感的 agent 循环,3.x Pro 用于多步推理,2.5 Pro 用于长上下文生产任务。无需 Google Cloud 项目或 Vertex AI 凭据 — 只需你已为聊天配置的同一个 RUNAPI_API_KEY 和 base_url。
使用 RunAPI 通过 Hermes Agent 向 Google Gemini 3.5 Flash 发送聊天请求。
要求:
- 使用 Hermes Agent 中已配置的 custom:runapi provider
- 调用 RunAPI 的 chat completions 端点 https://runapi.ai/v1/chat/completions
- 将 model 设为 "gemini-3.5-flash"
- 由 RUNAPI_API_KEY 环境变量提供授权
- 响应是同步的 — 回复内容位于 choices[0].message.content
- 如需流式,将 stream 设为 true 并处理 server-sent events
curl -X POST https://runapi.ai/v1/chat/completions \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.5-flash",
"messages": [
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain the difference between gRPC and REST in three sentences."}
],
"temperature": 0.7,
"max_tokens": 256
}'
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gemini-3.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "gRPC uses HTTP/2 and Protocol Buffers for strongly-typed, multiplexed RPC calls with built-in code generation. REST uses HTTP/1.1 (or 2) with JSON payloads and relies on URL paths and HTTP verbs for resource semantics. gRPC is faster for service-to-service calls; REST is simpler to debug and more widely supported by browsers."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 34,
"completion_tokens": 71,
"total_tokens": 105
}
}
三步在 Hermes Agent 中使用 Gemini
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already added RunAPI as a custom:runapi provider in Hermes Agent, the same key and base_url work for Gemini — change only the model ID. No Google Cloud credentials needed.
export RUNAPI_API_KEY=runapi_xxx
Call Gemini via chat completions
Send a POST request to /v1/chat/completions with model set to gemini-3.5-flash. Pass a messages array with system and user roles. Hermes Agent sends the same OpenAI-compatible request shape it uses for GPT — RunAPI routes to Gemini based on the model parameter.
POST /v1/chat/completions
Read the response
The response arrives synchronously in OpenAI chat completion format. The assistant reply is in choices[0].message.content, with token usage in the usage object. For streaming, set stream to true and Hermes Agent parses the SSE delta events automatically.
choices[0].message.content
Gemini chat completions API 参数
| 参数 | 类型 | 说明 |
|---|---|---|
model |
string |
Required. gemini-3.5-flash, gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview, or gemini-3.1-pro-preview. |
messages |
array |
Required. Array of message objects with role (system, user, assistant) and content fields. |
temperature |
number |
Optional. Sampling temperature between 0 and 2. Lower values produce more deterministic output. Default varies by model. |
max_tokens |
integer |
Optional. Maximum number of tokens to generate in the response. |
stream |
boolean |
Optional. When true, the response streams as server-sent events. Each event contains a delta with partial content. |
top_p |
number |
Optional. Nucleus sampling threshold between 0 and 1. Alternative to temperature for controlling output randomness. |
Hermes Agent 上的 Gemini 是什么?
Google Gemini 通过 RunAPI 的 custom:runapi provider 提供,无需任何 Google Cloud 凭证。Hermes Agent 使用与 GPT 和 Claude 相同的 OpenAI 兼容配置调用它。Gemini 3.5 Flash 是速度敏感型 agent 循环的最快选择,而 Gemini 2.5 Pro 提供 100 万 token 上下文窗口和思考模式。
Gemini 使用场景
Live API 实时语音和视频对话
使用 Gemini 的多模态能力,构建实时处理音频和视频输入(同时结合文本)的应用,通过 Hermes Agent 工作流打造能看能听的交互式 agent。
基于 Google 搜索数据的响应接地
在 Hermes Agent 工作流中启用 Gemini 的搜索接地功能,为需要最新信息的任务提供事实准确的回答,减少幻觉现象。
多模态文档处理流水线
在单次 Hermes Agent 运行中处理包含混合内容类型(PDF、图像、表格)的文档,从扫描表格提取数据、总结多媒体报告或对视频内容进行分类。
Gemini + Hermes Agent 常见问题
Yes. RunAPI provides Gemini through its OpenAI-compatible endpoint. Configure RunAPI as a custom:runapi provider with base_url https://runapi.ai/v1 and key_env RUNAPI_API_KEY. No Google Cloud project, service account, or Vertex AI setup required.
Flash (gemini-3.5-flash) is fastest and cheapest -- best for real-time agent loops, classification, and tool-calling chains. Pro (gemini-2.5-pro) handles complex reasoning, long-context analysis, and multi-step tasks. Use Flash for speed, Pro for depth.
When sending the same large context across multiple requests (like a codebase or document set), Gemini's context caching reduces input token costs on subsequent calls. This is especially useful in agent loops where the system prompt and reference material stay the same across many turns.
Yes. All RunAPI LLMs share the same custom:runapi provider and API key. Use the /model command or hermes model to switch between gemini-3.5-flash, gpt-5.5, claude-opus-4.6, or any other RunAPI model without changing provider config.
Yes. RunAPI passes the OpenAI-compatible tools and tool_choice parameters to Gemini. Define tools in the request body and Gemini returns tool_calls in the assistant message. Hermes Agent processes these the same way it handles tool calls from GPT or Claude.
Yes. Hermes Agent can call Gemini Flash for cheap preprocessing, GPT-5.5 for complex reasoning, and Claude for long-context analysis, all through the same RunAPI key and custom:runapi provider.
立即在 Hermes Agent 中试用 Gemini。
免费获取 RunAPI 密钥,在 custom:runapi provider 中将 model 设为 gemini-3.5-flash,即可在 Hermes Agent 中开始使用 Gemini。