Use Gemini in Hermes Agent.
Google Gemini is available through RunAPI's OpenAI-compatible endpoint. Hermes Agent calls it using the custom:runapi provider — Gemini 3.5 Flash for speed-sensitive agent loops, 3.x Pro for multi-step reasoning, 2.5 Pro for long-context production tasks. No Google Cloud project or Vertex AI credentials required — just the same RUNAPI_API_KEY and base_url you already configured for chat.
Use RunAPI to send a chat request to Google Gemini 3.5 Flash through Hermes Agent.
Requirements:
- Use the custom:runapi provider already configured in Hermes Agent
- Call the RunAPI chat completions endpoint at https://runapi.ai/v1/chat/completions
- Set model to "gemini-3.5-flash"
- The RUNAPI_API_KEY environment variable provides authorization
- The response is synchronous — the reply arrives in choices[0].message.content
- For streaming, set stream to true and process server-sent events
curl -X POST https://runapi.ai/v1/chat/completions \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.5-flash",
"messages": [
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Explain the difference between gRPC and REST in three sentences."}
],
"temperature": 0.7,
"max_tokens": 256
}'
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gemini-3.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "gRPC uses HTTP/2 and Protocol Buffers for strongly-typed, multiplexed RPC calls with built-in code generation. REST uses HTTP/1.1 (or 2) with JSON payloads and relies on URL paths and HTTP verbs for resource semantics. gRPC is faster for service-to-service calls; REST is simpler to debug and more widely supported by browsers."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 34,
"completion_tokens": 71,
"total_tokens": 105
}
}
Use Gemini in Hermes Agent in three steps
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already added RunAPI as a custom:runapi provider in Hermes Agent, the same key and base_url work for Gemini — change only the model ID. No Google Cloud credentials needed.
export RUNAPI_API_KEY=runapi_xxx
Call Gemini via chat completions
Send a POST request to /v1/chat/completions with model set to gemini-3.5-flash. Pass a messages array with system and user roles. Hermes Agent sends the same OpenAI-compatible request shape it uses for GPT — RunAPI routes to Gemini based on the model parameter.
POST /v1/chat/completions
Read the response
The response arrives synchronously in OpenAI chat completion format. The assistant reply is in choices[0].message.content, with token usage in the usage object. For streaming, set stream to true and Hermes Agent parses the SSE delta events automatically.
choices[0].message.content
Gemini chat completions API parameters
| Parameter | Type | Description |
|---|---|---|
model |
string |
Required. gemini-3.5-flash, gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview, or gemini-3.1-pro-preview. |
messages |
array |
Required. Array of message objects with role (system, user, assistant) and content fields. |
temperature |
number |
Optional. Sampling temperature between 0 and 2. Lower values produce more deterministic output. Default varies by model. |
max_tokens |
integer |
Optional. Maximum number of tokens to generate in the response. |
stream |
boolean |
Optional. When true, the response streams as server-sent events. Each event contains a delta with partial content. |
top_p |
number |
Optional. Nucleus sampling threshold between 0 and 1. Alternative to temperature for controlling output randomness. |
What is Gemini on Hermes Agent?
Google Gemini is available through RunAPI's custom:runapi provider without any Google Cloud credentials. Hermes Agent calls it using the same OpenAI-compatible config used for GPT and Claude. Gemini 3.5 Flash is the fastest option for speed-sensitive agent loops, while Gemini 2.5 Pro offers a 1M token context window and thinking mode for complex multi-step reasoning tasks.
Gemini use cases
Real-time voice and video chat with Live API
Use Gemini's multimodal capabilities for real-time applications that process audio and video input alongside text, building interactive agents that can see and hear through Hermes Agent workflows.
Grounding responses with Google Search data
Enable Google Search grounding on Gemini requests to get responses backed by current web data, useful for agents that need up-to-date information beyond their training cutoff.
Cost-efficient agent tool-calling chains
Run Gemini 3.5 Flash for fast, cheap tool-calling loops where the agent needs to make many sequential calls. Sub-100ms first-token latency keeps agent chains responsive without breaking the budget.
Gemini + Hermes Agent questions
Yes. RunAPI provides Gemini through its OpenAI-compatible endpoint. Configure RunAPI as a custom:runapi provider with base_url https://runapi.ai/v1 and key_env RUNAPI_API_KEY. No Google Cloud project, service account, or Vertex AI setup required.
Flash (gemini-3.5-flash) is fastest and cheapest -- best for real-time agent loops, classification, and tool-calling chains. Pro (gemini-2.5-pro) handles complex reasoning, long-context analysis, and multi-step tasks. Use Flash for speed, Pro for depth.
When sending the same large context across multiple requests (like a codebase or document set), Gemini's context caching reduces input token costs on subsequent calls. This is especially useful in agent loops where the system prompt and reference material stay the same across many turns.
Yes. All RunAPI LLMs share the same custom:runapi provider and API key. Use the /model command or hermes model to switch between gemini-3.5-flash, gpt-5.5, claude-opus-4.6, or any other RunAPI model without changing provider config.
Yes. RunAPI passes the OpenAI-compatible tools and tool_choice parameters to Gemini. Define tools in the request body and Gemini returns tool_calls in the assistant message. Hermes Agent processes these the same way it handles tool calls from GPT or Claude.
Yes. Hermes Agent can call Gemini Flash for cheap preprocessing, GPT-5.5 for complex reasoning, and Claude for long-context analysis, all through the same RunAPI key and custom:runapi provider.
Hermes Agent general setup
Not configured yet? Start with the RunAPI setup guide for Hermes Agent.
Hermes Agent setup guide →Try Gemini in Hermes Agent today.
Get a free RunAPI key, set model to gemini-3.5-flash in your custom:runapi provider, and start using Gemini in Hermes Agent.