在 OpenClaw 中使用 Kling。
Kuaishou 推出的 Kling 3.0 可从文本或图像生成最高 1080p 的视频,支持原生音频、多镜头场景以及 3–15 秒时长。OpenClaw agent 通过 RunAPI 调用它,使用与聊天相同的 API 密钥——发送提示词、轮询任务,即可获得视频 URL。
使用 RunAPI 通过 Kling 3.0 生成视频。
要求:
- 调用 POST https://runapi.ai/api/v1/kling/text_to_video
- 将 model 设置为 "kling-3.0"
- 从 RUNAPI_API_KEY 环境变量读取 API 密钥
- 设置 duration_seconds 控制时长(3–15 秒)
- 将 aspect_ratio 设置为 "16:9" 以生成横向视频
- 使用 enable_sound: true 启用原生音频
- 响应是异步的——轮询任务状态端点直到任务完成,然后获取视频 URL
curl -X POST https://runapi.ai/api/v1/kling/text_to_video \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kling-3.0",
"prompt": "A drone shot pulling back from a mountain lake at sunrise, mist rising off the water, cinematic lighting",
"duration_seconds": 5,
"aspect_ratio": "16:9",
"enable_sound": true,
"output_resolution": "1080p"
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "kling-3.0"
}
三步在 OpenClaw 中使用 Kling
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already configured RunAPI as an OpenClaw provider for chat, the same key works for video generation — no extra setup needed.
export RUNAPI_API_KEY=runapi_xxx
Call Kling text_to_video
Send a POST to /api/v1/kling/text_to_video with model set to kling-3.0. Include a prompt, duration_seconds (3–15), aspect_ratio, and optionally enable_sound for native audio. For image-driven generation, use /api/v1/kling/image_to_video with a first_frame_image_url instead.
POST /api/v1/kling/text_to_video
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status changes to completed, then retrieve the video URL from the response. Generation typically takes 30–120 seconds depending on duration and resolution.
GET /api/v1/kling/text_to_video/tsk_abc123
Kling text_to_video API 参数
| 参数 | 类型 | 说明 |
|---|---|---|
model |
string |
Required. kling-3.0 for the latest version. |
prompt |
string |
Video description. Required unless multi_shots is enabled. |
duration_seconds |
integer |
Video length. Kling 3.0 supports 3–15 seconds. Older versions accept 5 or 10. |
aspect_ratio |
string |
Output aspect ratio: 16:9, 9:16, or 1:1. |
output_resolution |
string |
Resolution: 720p, 1080p, or 4k. Higher resolution costs more per second. |
enable_sound |
boolean |
Generate native audio alongside video. Increases per-second cost. |
negative_prompt |
string |
Elements to exclude from generation. |
first_frame_image_url |
string |
Image URL to use as the opening frame (single-shot mode). |
cfg_scale |
number |
Guidance scale (0–1). Higher values follow the prompt more closely. |
multi_shots |
boolean |
Enable multi-shot scene generation with separate prompts per segment. |
OpenClaw 上的 Kling 是什么?
快手推出的 Kling 3.0 以出色的布料模拟、流体动力学和运动物理著称,可生成电影级质量的视频。它能从文本或图像生成最长 3 分钟、最高 1080p、带原生音频和多镜头场景的视频。OpenClaw agent 通过 RunAPI 端点调用它,使用与聊天相同的 API key。
Kling 使用场景
B-roll 与空镜素材
为紧迫截止日期生成场景长度的 B-roll 素材——自然风景、旅游内容和环境镜头,充分发挥 Kling 运动物理和电影光效的优势。
产品生活方式内容
仅用一张图像或文字提示,为食品、时尚或生活方式品牌创作产品视频,带自然镜头运动和写实材质渲染。
社交媒体短片
生成适合 TikTok、Reels 或 YouTube Shorts 的电影感短片。将 duration_seconds 设为 5 或 10,即可输出平台适配时长的内容。
Kling + OpenClaw 常见问题
Kling charges per second of generated video. The rate depends on output_resolution and whether enable_sound is on. A 5-second 720p clip without sound is the cheapest option; 1080p with sound costs roughly twice as much per second. Check the RunAPI pricing page for exact rates.
No. RunAPI only bills for completed generations. If the task fails or times out, the reserved credits are rolled back to your account balance.
Yes. Set enable_sound to true in the request body. Kling 3.0 generates synchronized audio matching the video content. Sound generation increases the per-second cost -- at 720p, sound adds about 3 cents per second.
Generation typically takes 30 to 120 seconds depending on duration and resolution. Longer clips at 1080p with sound take the most time. The API returns a task_id immediately so your agent can do other work while waiting.
Kling 3.0 has a separate motion_control endpoint at /api/v1/kling/motion_control for applying motion presets to a source image with a reference video. The text_to_video endpoint relies on prompt descriptions for camera direction.