Use Kling in Hermes Agent.
Kling 3.0 by Kuaishou generates video from text or images at up to 1080p with native audio, multi-shot scenes, and 3–15 second durations. Hermes Agent calls it through RunAPI using the custom:runapi provider — same key and base URL you configured for chat.
Use RunAPI to generate a video with Kling 3.0 through Hermes Agent.
Requirements:
- Use the custom:runapi provider already configured in Hermes Agent
- Call POST https://runapi.ai/api/v1/kling/text_to_video
- Set model to "kling-3.0"
- The RUNAPI_API_KEY environment variable provides authorization
- Set duration_seconds to control length (3–15 seconds)
- Set aspect_ratio to "16:9" for landscape video
- Enable sound with enable_sound: true for native audio
- The response is async — poll the task status endpoint until the task completes, then retrieve the video URL
curl -X POST https://runapi.ai/api/v1/kling/text_to_video \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kling-3.0",
"prompt": "A drone shot pulling back from a mountain lake at sunrise, mist rising off the water, cinematic lighting",
"duration_seconds": 5,
"aspect_ratio": "16:9",
"enable_sound": true,
"output_resolution": "1080p"
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "kling-3.0"
}
Use Kling in Hermes Agent in three steps
Configure RunAPI
Set the RUNAPI_API_KEY environment variable. If you already added RunAPI as a custom:runapi provider in Hermes Agent for chat, the same key and base_url work for video generation — no extra configuration needed.
export RUNAPI_API_KEY=runapi_xxx
Call Kling text_to_video
Send a POST to /api/v1/kling/text_to_video with model set to kling-3.0. Include a prompt, duration_seconds (3–15), aspect_ratio, and optionally enable_sound for native audio. For image-driven generation, use /api/v1/kling/image_to_video with a first_frame_image_url instead.
POST /api/v1/kling/text_to_video
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status changes to completed, then retrieve the video URL from the response. Generation typically takes 30–120 seconds depending on duration and resolution.
GET /api/v1/kling/text_to_video/tsk_abc123
Kling text_to_video API parameters
| Parameter | Type | Description |
|---|---|---|
model |
string |
Required. kling-3.0 for the latest version. |
prompt |
string |
Video description. Required unless multi_shots is enabled. |
duration_seconds |
integer |
Video length. Kling 3.0 supports 3–15 seconds. Older versions accept 5 or 10. |
aspect_ratio |
string |
Output aspect ratio: 16:9, 9:16, or 1:1. |
output_resolution |
string |
Resolution: 720p, 1080p, or 4k. Higher resolution costs more per second. |
enable_sound |
boolean |
Generate native audio alongside video. Increases per-second cost. |
negative_prompt |
string |
Elements to exclude from generation. |
first_frame_image_url |
string |
Image URL to use as the opening frame (single-shot mode). |
cfg_scale |
number |
Guidance scale (0–1). Higher values follow the prompt more closely. |
multi_shots |
boolean |
Enable multi-shot scene generation with separate prompts per segment. |
What is Kling on Hermes Agent?
Kling 3.0 by Kuaishou delivers cinematic-quality clips with character consistency and strong motion physics -- cloth draping, fluid dynamics, and realistic camera movement. Through the Hermes Agent custom:runapi provider, you get text-to-video and image-to-video at up to 1080p with native audio, generating clips from 3 to 15 seconds (or up to 3 minutes with multi-shot mode).
Kling use cases
Longer narrative content
Use Kling's multi-shot mode to build scene-length footage up to 3 minutes, connecting establishing shots and character sequences with consistent visuals across segments.
Travel and nature content
Generate travel vlog B-roll and nature footage with realistic environment rendering. Kling handles water, mist, and atmospheric lighting well for outdoor scenes.
Product demo videos
Animate a product image into a short video with camera movement and natural lighting transitions -- useful for e-commerce listings and social ads.
Kling + Hermes Agent questions
Kling charges per second of generated video. The rate depends on output_resolution and whether enable_sound is on. A 5-second 720p clip without sound is the cheapest option; 1080p with sound costs roughly twice as much per second. Check the RunAPI pricing page for exact rates.
Kling supports 3 to 15 seconds per clip (and multi-shot sequences up to 3 minutes), while Runway caps at 5 or 10 seconds. For scene-length footage, Kling gives you more flexibility. Runway tends to produce cleaner cinematic framing on shorter clips.
Yes. RunAPI also hosts kling-v2.5-turbo-text-to-video-pro and kling-v2.5-turbo-image-to-video-pro for faster, lower-cost generation at 5 or 10 seconds. Set the model parameter to the version slug you want.
No. If you already configured the custom:runapi provider in Hermes Agent for chat or image generation, the same base_url and API key work for Kling video endpoints. Just change the request path and model parameter.
No. RunAPI only bills for completed generations. If the task fails or times out, the reserved credits are rolled back to your account balance.
Hermes Agent calls the Kling endpoint with scene descriptions and camera control parameters through the custom:runapi provider. For multi-shot sequences, the agent can chain multiple generation calls and manage continuity between shots.
Yes. Hermes Agent can orchestrate Kling for video and then call ElevenLabs or Suno through RunAPI to add voiceover or background music, assembling the complete package in one workflow.
Hermes Agent general setup
Not configured yet? Start with the RunAPI setup guide for Hermes Agent.
Hermes Agent setup guide →Try Kling in Hermes Agent today.
Get a free RunAPI key, configure the custom:runapi provider, and start generating video with Kling 3.0.