Use Claude in Hermes Agent.
Anthropic Claude provides Opus 4.8 for maximum capability (200K context, extended thinking), Sonnet 4.6 for balanced performance, and Haiku 4.5 for speed. Hermes Agent calls Claude through the custom:runapi provider at 50% of Anthropic's official per-token rate — same key and base_url you configured for chat.
Use RunAPI to send a Claude chat completion request through Hermes Agent.
Requirements:
- Use the custom:runapi provider already configured in Hermes Agent
- Call the RunAPI chat completions endpoint at https://runapi.ai/v1/chat/completions
- Set model to "claude-opus-4.8"
- The RUNAPI_API_KEY environment variable provides authorization
- The response is synchronous — the assistant message is returned directly in the response body
- For streaming, set "stream": true to receive server-sent events
curl -X POST https://runapi.ai/v1/chat/completions \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4.8",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain the difference between a mutex and a semaphore in three sentences."}
]
}'
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "claude-opus-4.8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A mutex is a locking mechanism that allows only one thread to access a resource at a time..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 87,
"total_tokens": 111
}
}
Use Claude in Hermes Agent in three steps
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already added RunAPI as a custom:runapi provider in Hermes Agent, the same key and base_url work for Claude — switch the model parameter to claude-opus-4.8 in your Hermes config or use the /model command.
export RUNAPI_API_KEY=runapi_xxx
Call Claude
Send a POST request to /v1/chat/completions with model set to claude-opus-4.8. Include a messages array with at least one user message. Set max_tokens to control response length. Add "stream" true for token-by-token SSE output in your Hermes session.
POST /v1/chat/completions
Read the response
The endpoint returns the assistant message synchronously — no task polling needed. Hermes Agent displays the response inline. Token usage counts are included in the response for billing transparency. Streaming responses arrive as SSE events for real-time display.
usage.total_tokens: 111
Claude API parameters (OpenAI-compatible)
| Parameter | Type | Description |
|---|---|---|
model |
string |
Required. claude-opus-4.8, claude-sonnet-4.6, claude-haiku-4.5, or any Claude variant listed in the RunAPI catalog. |
messages |
array |
Required. Array of message objects with role (system, user, assistant) and content fields. |
max_tokens |
integer |
Maximum number of tokens in the response. Defaults vary by model — set explicitly for predictable billing. |
stream |
boolean |
When true, returns server-sent events with incremental token deltas instead of a single JSON response. |
temperature |
float |
Sampling temperature between 0 and 1. Lower values produce more deterministic output. |
top_p |
float |
Nucleus sampling cutoff. Alternative to temperature — use one or the other, not both. |
What is Claude on Hermes Agent?
Claude is Anthropic's LLM, and Hermes Agent calls it through the custom:runapi provider at half the official Anthropic per-token price. The three tiers -- Opus 4.8 (200K context, extended thinking), Sonnet 4.6 (balanced speed and quality), and Haiku 4.5 (fast and cheap) -- all work through the same provider config. Switch between them per request by changing only the model field, no reconfiguration needed.
Claude use cases
Building AI agents with tool use and MCP
Use Claude's function calling and Model Context Protocol support in Hermes Agent to build multi-step automated workflows that read files, query databases, and take actions based on reasoning.
Code generation and review
Route coding tasks through Claude in Hermes Agent -- Opus 4.8 for complex architecture decisions and multi-file refactors, Sonnet 4.6 for everyday pull request reviews and test generation.
Content generation with prompt caching
Generate marketing copy, documentation, or reports at scale using prompt caching to reduce costs when the system prompt and context stay the same across many requests.
Claude + Hermes Agent questions
Yes. Configure RunAPI as a custom:runapi provider in Hermes Agent with base_url https://runapi.ai/v1 and api_mode chat_completions. Set model to claude-opus-4.8 or any other Claude variant. The same RUNAPI_API_KEY handles chat, image, video, and music models.
RunAPI charges 50% of Anthropic's official rate. Opus 4.8 is $7.50/$37.50 per million input/output tokens through RunAPI versus $15/$75 direct. With prompt caching enabled, cached input tokens cost even less. No subscription or volume commitment required.
No. Change only the model parameter in your Hermes config or use the /model command during a session. The custom:runapi provider, base_url, and API key stay the same across all Claude variants -- Opus 4.8, Sonnet 4.6, Haiku 4.5, and dated snapshots.
RunAPI exposes both /v1/chat/completions (OpenAI-compatible, used by Hermes Agent's chat_completions mode) and /v1/messages (native Anthropic format). The native endpoint supports extended thinking and Anthropic-specific features. For Hermes Agent, the OpenAI-compatible path covers standard chat and streaming.
Include a cache_control breakpoint on your system prompt or large context blocks. Subsequent requests that share the same cached prefix pay a reduced input token rate. This is especially effective for agent loops where the system prompt and tool definitions repeat across many turns.
Yes. Pass the extended thinking parameters in your request body. Hermes Agent forwards them to the RunAPI Claude endpoint, which supports the same extended thinking configuration as the direct Anthropic API.
Hermes Agent general setup
Not configured yet? Start with the RunAPI setup guide for Hermes Agent.
Hermes Agent setup guide →Claude model catalog
See all Claude variants, per-token pricing, and context window details.
Claude models →Try Claude in Hermes Agent today.
Get a free RunAPI key, configure the custom:runapi provider, and start using Claude at 50% of the official Anthropic rate.