Use Wan in Hermes Agent.
Wan is Alibaba's open-source video and image generation model, Apache 2.0 licensed and ranked #1 on the Artificial Analysis text-to-video leaderboard. It spans 20+ variants from Wan 2.2 through 2.7 — text-to-video, image-to-video, speech-to-video with lip-sync, video editing via R2V, and image generation up to 4K. Hermes Agent calls any Wan endpoint through the same RunAPI custom provider and API key used for chat.
Use RunAPI to generate a video with Alibaba Wan 2.7.
Requirements:
- Read the API key from RUNAPI_API_KEY.
- Use the custom:runapi provider with base_url https://runapi.ai/v1.
- Call POST https://runapi.ai/api/v1/task/text_to_video
- Set model to "wan-2.7-text-to-video".
- Set output_resolution to "1080p" for full HD output.
- Include a detailed prompt describing the scene, camera motion, and lighting.
- The response is async. Poll the returned task_id until status is "completed".
- When done, read the video URL from the response output.
curl -X POST https://runapi.ai/api/v1/task/text_to_video \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-2.7-text-to-video",
"prompt": "A drone shot rising over terraced rice paddies at golden hour, mist rolling through the valleys, slow upward camera tilt",
"output_resolution": "1080p"
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "wan-2.7-text-to-video"
}
Use Wan in Hermes Agent in three steps
Configure RunAPI
Set RUNAPI_API_KEY in the environment where Hermes Agent runs. If you already added RunAPI as a custom:runapi provider, the same key and base_url handle all Wan endpoints — no additional setup needed.
export RUNAPI_API_KEY=runapi_xxx
Call a Wan endpoint
Send a POST request to text_to_video with model set to wan-2.7-text-to-video and output_resolution to 720p or 1080p. For image-to-video, use wan-2.7-image-to-video with a first_frame_image_url. For speech-driven video, use wan-2.2-a14b-speech-to-video-turbo with source_audio_url and source_image_url. Hermes Agent routes all requests through the custom:runapi provider.
POST /api/v1/task/text_to_video
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status is completed, then read the output video or image URL from the response. RunAPI SDKs and the CLI handle polling automatically.
GET /api/v1/task/text_to_video/tsk_abc123
Wan text_to_video API parameters
| Parameter | Type | Description |
|---|---|---|
model |
string |
Required. wan-2.7-text-to-video, wan-2.6-text-to-video, wan-2.5-text-to-video, wan-2.2-a14b-text-to-video-turbo, or wan-2.7-r2v. |
prompt |
string |
Required. Text description of the desired video scene, including camera motion, lighting, and subject detail. |
output_resolution |
string |
Optional. 720p or 1080p for Wan 2.5+. Wan 2.2 also accepts 480p and 580p. Defaults to 720p. |
aspect_ratio |
string |
Optional. For wan-2.7-r2v only. Accepted values: 16:9, 9:16, 1:1, 4:3, 3:4. |
duration_seconds |
integer |
Optional. For wan-2.7-r2v only. Video length in seconds, 2 to 10. |
seed |
integer |
Optional. Reproducibility seed for deterministic output. |
callback_url |
string |
Optional. Webhook URL that receives a POST when the task completes. |
What is Wan on Hermes Agent?
Wan by Alibaba is an Apache 2.0 open-source video model that leads the Artificial Analysis leaderboard for text-to-video quality. Through the Hermes Agent custom:runapi provider, it spans 20+ variants -- text-to-video, image-to-video, speech-to-video with lip sync, and video editing. Its open weights mean you can also self-host it if your workflow requires data privacy, while RunAPI handles the GPU infrastructure for hosted use.
Wan use cases
Branded content at volume
Use Wan's character consistency and non-expiring credits to produce branded video content at scale. Hermes Agent can dispatch parallel generation tasks across different product lines.
Dialogue-heavy content with lip sync
Chain ElevenLabs TTS with Wan's speech-to-video endpoint in one Hermes Agent workflow to go from script text to a lip-synced talking video without manual steps.
Filmmakers and agency pre-visualization
Generate production-grade pre-vis clips with endpoint-anchored keyframes. Set first and last frame images to control exact scene transitions for client review.
Wan + Hermes Agent questions
All of them. text_to_video, image_to_video, speech_to_video, text_to_image (Wan 2.7 Image), edit_video, and animate. Configure RunAPI as a custom:runapi provider once, then switch endpoints and model slugs per request — for example wan-2.7-text-to-video for video and wan-2.7-image for image generation up to 4K.
Add a custom:runapi provider entry with base_url set to https://runapi.ai/v1 and your RUNAPI_API_KEY as the API key. Once configured, every Wan endpoint — and all 113+ RunAPI models — is accessible through the same provider without additional plugins.
Wan 2.5 introduced 1080p output. Wan 2.6 added video editing (R2V) and flash variants for faster generation. Wan 2.7 adds image generation (wan-2.7-image, wan-2.7-image-pro up to 4K), video editing (wan-2.7-edit-video), and improved text-to-video quality that leads the Artificial Analysis leaderboard.
Costs vary by variant and resolution. A 720p text-to-video clip with Wan 2.7 runs about 25-35 cents per generation. 1080p costs more. Speech-to-video is priced per generation regardless of length. Check the RunAPI pricing page for exact per-model rates -- credits on RunAPI do not expire.
Yes. Hermes Agent can chain ElevenLabs TTS to generate speech audio, then pass the audio URL to Wan's speech-to-video endpoint, creating a complete text-to-spoken-video pipeline in one workflow.
Hermes Agent general setup
Not configured yet? Start with the RunAPI setup guide for Hermes Agent.
Hermes Agent setup guide →Try Wan in Hermes Agent today.
Get a free RunAPI key, configure the custom:runapi provider, and generate video with the #1 ranked open-source model — text-to-video, image-to-video, or speech-to-video.