Use Kling in OpenClaw.
Kling 3.0 by Kuaishou generates video from text or images at up to 1080p with native audio, multi-shot scenes, and 3–15 second durations. OpenClaw agents call it through RunAPI with the same API key used for chat — send a prompt, poll the task, and receive a video URL.
Use RunAPI to generate a video with Kling 3.0.
Requirements:
- Call POST https://runapi.ai/api/v1/kling/text_to_video
- Set model to "kling-3.0"
- Read the API key from RUNAPI_API_KEY environment variable
- Set duration_seconds to control length (3–15 seconds)
- Set aspect_ratio to "16:9" for landscape video
- Enable sound with enable_sound: true for native audio
- The response is async — poll the task status endpoint until the task completes, then retrieve the video URL
curl -X POST https://runapi.ai/api/v1/kling/text_to_video \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kling-3.0",
"prompt": "A drone shot pulling back from a mountain lake at sunrise, mist rising off the water, cinematic lighting",
"duration_seconds": 5,
"aspect_ratio": "16:9",
"enable_sound": true,
"output_resolution": "1080p"
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "kling-3.0"
}
Use Kling in OpenClaw in three steps
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already configured RunAPI as an OpenClaw provider for chat, the same key works for video generation — no extra setup needed.
export RUNAPI_API_KEY=runapi_xxx
Call Kling text_to_video
Send a POST to /api/v1/kling/text_to_video with model set to kling-3.0. Include a prompt, duration_seconds (3–15), aspect_ratio, and optionally enable_sound for native audio. For image-driven generation, use /api/v1/kling/image_to_video with a first_frame_image_url instead.
POST /api/v1/kling/text_to_video
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status changes to completed, then retrieve the video URL from the response. Generation typically takes 30–120 seconds depending on duration and resolution.
GET /api/v1/kling/text_to_video/tsk_abc123
Kling text_to_video API parameters
| Parameter | Type | Description |
|---|---|---|
model |
string |
Required. kling-3.0 for the latest version. |
prompt |
string |
Video description. Required unless multi_shots is enabled. |
duration_seconds |
integer |
Video length. Kling 3.0 supports 3–15 seconds. Older versions accept 5 or 10. |
aspect_ratio |
string |
Output aspect ratio: 16:9, 9:16, or 1:1. |
output_resolution |
string |
Resolution: 720p, 1080p, or 4k. Higher resolution costs more per second. |
enable_sound |
boolean |
Generate native audio alongside video. Increases per-second cost. |
negative_prompt |
string |
Elements to exclude from generation. |
first_frame_image_url |
string |
Image URL to use as the opening frame (single-shot mode). |
cfg_scale |
number |
Guidance scale (0–1). Higher values follow the prompt more closely. |
multi_shots |
boolean |
Enable multi-shot scene generation with separate prompts per segment. |
What is Kling on OpenClaw?
Kling 3.0 by Kuaishou is known for cinematic-quality video with strong cloth simulation, fluid dynamics, and motion physics. It generates clips up to 3 minutes long from text or images at up to 1080p with native audio and multi-shot scenes. OpenClaw agents call it through the RunAPI endpoint with the same API key used for chat.
Kling use cases
B-roll and establishing shots
Generate scene-length B-roll footage for tight deadlines -- nature shots, travel content, and environment footage where Kling's motion physics and cinematic lighting stand out.
Product lifestyle content
Create product videos for food, fashion, or lifestyle brands from a single image or text prompt, with natural camera movement and realistic material rendering.
Social media shorts
Produce short clips for TikTok, Reels, or YouTube Shorts with cinematic framing. Set duration_seconds to 5 or 10 for platform-ready lengths.
Kling + OpenClaw questions
Kling charges per second of generated video. The rate depends on output_resolution and whether enable_sound is on. A 5-second 720p clip without sound is the cheapest option; 1080p with sound costs roughly twice as much per second. Check the RunAPI pricing page for exact rates.
No. RunAPI only bills for completed generations. If the task fails or times out, the reserved credits are rolled back to your account balance.
Yes. Set enable_sound to true in the request body. Kling 3.0 generates synchronized audio matching the video content. Sound generation increases the per-second cost -- at 720p, sound adds about 3 cents per second.
Generation typically takes 30 to 120 seconds depending on duration and resolution. Longer clips at 1080p with sound take the most time. The API returns a task_id immediately so your agent can do other work while waiting.
Kling 3.0 has a separate motion_control endpoint at /api/v1/kling/motion_control for applying motion presets to a source image with a reference video. The text_to_video endpoint relies on prompt descriptions for camera direction.
OpenClaw general setup
Not configured yet? Start with the RunAPI setup guide for OpenClaw.
OpenClaw setup guide →Try Kling in OpenClaw today.
Get a free RunAPI key, paste the prompt into OpenClaw, and start generating video with Kling 3.0.