在 OpenClaw 中使用 ElevenLabs。
ElevenLabs 通过 RunAPI 提供六个音频端点——亚秒级延迟的 turbo-v2.5 TTS、覆盖 29 种语言的 multilingual-v2、用于多说话人对话的 dialogue-v3、音效、语音转文本转录以及人声分离。OpenClaw agent 使用与聊天相同的 RunAPI 密钥调用其中任意一个。
使用 RunAPI 通过 ElevenLabs 文本转语音生成语音音频。
要求:
- 从 RUNAPI_API_KEY 读取 API 密钥。
- 调用 POST https://runapi.ai/api/v1/elevenlabs/text_to_speech
- 将 model 设为 "text-to-speech-turbo-v2.5"。
- 将 text 设为你想要朗读的内容。
- 可选地将 voice 设为特定的 ElevenLabs voice ID。
- 可选地将 speed 设在 0.7 到 1.2 之间。
- 该任务为异步任务。轮询返回的 task_id 直到 status 为 "completed"。
- 完成后,从响应 output 中读取音频 URL。
curl -X POST https://runapi.ai/api/v1/elevenlabs/text_to_speech \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-to-speech-turbo-v2.5",
"text": "Welcome to RunAPI. This audio was generated by ElevenLabs turbo v2.5.",
"speed": 1.0,
"stability": 0.5,
"similarity_boost": 0.75
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "text-to-speech-turbo-v2.5"
}
三步在 OpenClaw 中使用 ElevenLabs
Configure RunAPI
Set RUNAPI_API_KEY in your environment. If you already configured RunAPI for chat in OpenClaw, the same key works for all ElevenLabs endpoints — TTS, STT, dialogue, sound effects, and audio isolation.
export RUNAPI_API_KEY=runapi_xxx
Call text_to_speech
Send a POST to the text_to_speech endpoint with model set to text-to-speech-turbo-v2.5, the text you want spoken, and optional voice, speed, and stability parameters. For multilingual output, use text-to-speech-multilingual-v2 with a voice and language_code.
POST /api/v1/elevenlabs/text_to_speech
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status is completed, then read the output audio URL from the response.
GET /api/v1/elevenlabs/text_to_speech/tsk_abc123
ElevenLabs text_to_speech API 参数
| 参数 | 类型 | 说明 |
|---|---|---|
model |
string |
Required. text-to-speech-turbo-v2.5 (low latency) or text-to-speech-multilingual-v2 (29 languages). |
text |
string |
Required. The text to convert to speech. Max 5000 characters. |
voice |
string |
ElevenLabs voice ID. Required for multilingual-v2. Turbo-v2.5 uses a default voice if omitted. |
speed |
float |
Optional. Playback speed multiplier. Range 0.7 to 1.2. |
stability |
float |
Optional. Voice consistency. Range 0.0 to 1.0. Lower values add expressiveness. |
similarity_boost |
float |
Optional. Voice similarity enforcement. Range 0.0 to 1.0. |
style |
float |
Optional. Style exaggeration. Range 0.0 to 1.0. |
language_code |
string |
Optional. Target language for multilingual-v2, e.g. en, es, ja. |
callback_url |
string |
Optional. Webhook URL that receives a POST when the task completes. |
OpenClaw 上的 ElevenLabs 是什么?
ElevenLabs 是最常用的自然语音文本转语音 API。通过 RunAPI,OpenClaw agent 可访问 turbo-v2.5(英语亚秒级延迟)、multilingual-v2(29 种语言)、dialogue-v3(多说话者对话)、音效生成、语音转文字和人声分离功能。
ElevenLabs 使用场景
有声书与播客旁白
将长篇文字转化为采用一致角色声音的语音音频。调节稳定性以保持旁白一致性,提高相似度增强以在数小时内容中保持声音接近原始声音档案。
视频多语言配音
使用 multilingual-v2 和相同声音档案将视频内容配音为 29 种语言,制作保留原说话人声音特征的本地化版本。
视频和游戏制作音效
通过 text_to_sound 端点从文字描述生成自定义拟音音效、环境音频和音效提示,以按需生成取代音效库搜索。
ElevenLabs + OpenClaw 常见问题
Start with stability at 0.5 and similarity_boost at 0.75. Higher stability makes the voice more consistent but less expressive. Higher similarity keeps the voice closer to the original profile. For audiobooks, try stability 0.6-0.8. For conversational content, lower stability (0.3-0.5) adds natural variation.
Turbo-v2.5 is optimized for low latency and English-first output -- it applies a default voice when none is specified. Multilingual-v2 supports 29 languages and requires an explicit voice ID and optional language_code. Turbo costs roughly half as much per character.
Use turbo-v2.5 for English content -- it costs roughly half as much per character as multilingual-v2. Break long texts into chunks under 5000 characters per request. Use the RunAPI batch approach to process chapters in parallel rather than sequentially.
Text-to-speech and dialogue endpoints are billed per character of input text. Speech-to-text is billed per minute of audio. Audio isolation is billed per task. Check the RunAPI pricing page for current rates.
Yes. Call the text_to_dialogue endpoint with model text-to-dialogue-v3. Pass a dialogue array where each item has a text and a voice ID. The total text across all speakers must be under 5000 characters.
立即在 OpenClaw 中试用 ElevenLabs。
免费获取 RunAPI 密钥,将提示词粘贴到 OpenClaw 中,使用 ElevenLabs 生成语音音频——六个端点、一个 API 密钥、按字符计费。