Can I use ElevenLabs in Hermes Agent?

Yes. Configure RunAPI as a custom:runapi provider in Hermes Agent with base_url https://runapi.ai/v1 and key_env RUNAPI_API_KEY, then call any ElevenLabs endpoint -- text_to_speech, speech_to_text, text_to_dialogue, text_to_sound, or isolate_audio.

What stability and similarity settings produce the most natural voice?

Start with stability at 0.5 and similarity_boost at 0.75. Higher stability makes the voice more consistent but less expressive. Higher similarity keeps the voice closer to the original profile. For audiobooks, try stability 0.6-0.8. For conversational content, lower stability (0.3-0.5) adds natural variation.

How do I reduce ElevenLabs costs for long-form content like audiobooks?

Use turbo-v2.5 for English content -- it costs roughly half as much per character as multilingual-v2. Break long texts into chunks under 5000 characters per request. Use the RunAPI batch approach to process chapters in parallel rather than sequentially.

Can I transcribe audio with ElevenLabs in Hermes Agent?

Yes. Call the speech_to_text endpoint at /api/v1/elevenlabs/speech_to_text with a source_audio_url. The endpoint supports optional speaker diarization via the diarize parameter and audio event tagging via tag_audio_events. Results are returned asynchronously.

How does audio isolation work through RunAPI?

Call the isolate_audio endpoint at /api/v1/elevenlabs/isolate_audio with a source_audio_url pointing to your mixed audio file. The endpoint extracts vocals from background noise and returns a cleaned audio URL. The task is async -- poll or use a callback_url.

Can Hermes Agent chain ElevenLabs TTS with video generation in one workflow?

Yes. Hermes Agent can generate speech with ElevenLabs, then pass the audio URL to InfiniteTalk for avatar video or to Wan for speech-to-video, creating a complete text-to-spoken-video pipeline in one run.

HERMES + ELEVENLABS

在 Hermes Agent 中使用 ElevenLabs。

ElevenLabs 通过 RunAPI 提供六个音频端点——亚秒级延迟的 turbo-v2.5 TTS、覆盖 29 种语言的 multilingual-v2、用于多说话人对话的 dialogue-v3、音效、语音转文本转录以及人声分离。Hermes Agent 通过 custom:runapi 提供商，使用一个 API 密钥调用它们。

获取 API Key 阅读文档

一个 API 密钥 · 文本转语音端点 · 按字符计费

使用 RunAPI 通过 ElevenLabs 文本转语音生成语音音频。

要求：
- 从 RUNAPI_API_KEY 读取 API 密钥。
- 使用 custom:runapi 提供商，base_url 为 https://runapi.ai/v1。
- 调用 POST https://runapi.ai/api/v1/elevenlabs/text_to_speech
- 将 model 设为 "text-to-speech-turbo-v2.5"。
- 将 text 设为你想要朗读的内容。
- 可选地将 voice 设为特定的 ElevenLabs voice ID。
- 可选地将 speed 设在 0.7 到 1.2 之间。
- 该任务为异步任务。轮询返回的 task_id 直到 status 为 "completed"。
- 完成后，从响应 output 中读取音频 URL。

curl -X POST https://runapi.ai/api/v1/elevenlabs/text_to_speech \
  -H "Authorization: Bearer $RUNAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-to-speech-turbo-v2.5",
    "text": "Welcome to RunAPI. This audio was generated by ElevenLabs turbo v2.5.",
    "speed": 1.0,
    "stability": 0.5,
    "similarity_boost": 0.75
  }'

{
  "task_id": "tsk_abc123",
  "status": "pending",
  "model": "text-to-speech-turbo-v2.5"
}

复制 curl 命令进行测试 elevenlabs

工作原理

三步在 Hermes Agent 中使用 ElevenLabs

Configure RunAPI

Set RUNAPI_API_KEY in the environment where Hermes Agent runs. If you already added RunAPI as a custom:runapi provider, the same key and base_url handle all ElevenLabs endpoints — TTS, STT, dialogue, sound effects, and audio isolation.

export RUNAPI_API_KEY=runapi_xxx

Call text_to_speech

Send a POST to the text_to_speech endpoint with model set to text-to-speech-turbo-v2.5, the text you want spoken, and optional voice, speed, and stability parameters. Hermes Agent routes the request through the custom:runapi provider. For multilingual output, use text-to-speech-multilingual-v2 with a voice and language_code.

POST /api/v1/elevenlabs/text_to_speech

Poll for the result

The endpoint returns a task_id immediately. Poll the task status endpoint until the status is completed, then read the output audio URL from the response.

GET /api/v1/elevenlabs/text_to_speech/tsk_abc123

参数

ElevenLabs text_to_speech API 参数

参数	类型	说明
`model`	`string`	Required. text-to-speech-turbo-v2.5 (low latency) or text-to-speech-multilingual-v2 (29 languages).
`text`	`string`	Required. The text to convert to speech. Max 5000 characters.
`voice`	`string`	ElevenLabs voice ID. Required for multilingual-v2. Turbo-v2.5 uses a default voice if omitted.
`speed`	`float`	Optional. Playback speed multiplier. Range 0.7 to 1.2.
`stability`	`float`	Optional. Voice consistency. Range 0.0 to 1.0. Lower values add expressiveness.
`similarity_boost`	`float`	Optional. Voice similarity enforcement. Range 0.0 to 1.0.
`style`	`float`	Optional. Style exaggeration. Range 0.0 to 1.0.
`language_code`	`string`	Optional. Target language for multilingual-v2, e.g. en, es, ja.
`callback_url`	`string`	Optional. Webhook URL that receives a POST when the task completes.

Hermes Agent 上的 ElevenLabs 是什么？

ElevenLabs 是领先的文本转语音 API，Hermes Agent 通过 custom:runapi provider 调用它，用于语音生成、转录和音频处理。在 Hermes 中的核心优势是串联——生成语音后，将音频 URL 传给 InfiniteTalk 制作说话头像，或传给视频模型完成完整视听内容，全部在单次 agent 运行中完成。