在 Hermes Agent 中使用 Wan。
Wan 是 Alibaba 的开源视频与图像生成模型,采用 Apache 2.0 许可,在 Artificial Analysis 文生视频排行榜上位列第一。它涵盖从 Wan 2.2 到 2.7 的 20 多个版本 —— 文生视频、图生视频、带唇形同步的语音生视频、通过 R2V 进行视频编辑,以及最高 4K 的图像生成。Hermes Agent 使用与聊天相同的 RunAPI 自定义 provider 和 API 密钥调用任意 Wan 端点。
Use RunAPI to generate a video with Alibaba Wan 2.7.
要求:
- Read the API key from RUNAPI_API_KEY.
- Use the custom:runapi provider with base_url https://runapi.ai/v1.
- Call POST https://runapi.ai/api/v1/task/text_to_video
- Set model to "wan-2.7-text-to-video".
- Set output_resolution to "1080p" for full HD output.
- Include a detailed prompt describing the scene, camera motion, and lighting.
- The response is async. Poll the returned task_id until status is "completed".
- When done, read the video URL from the response output.
curl -X POST https://runapi.ai/api/v1/task/text_to_video \
-H "Authorization: Bearer $RUNAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan-2.7-text-to-video",
"prompt": "A drone shot rising over terraced rice paddies at golden hour, mist rolling through the valleys, slow upward camera tilt",
"output_resolution": "1080p"
}'
{
"task_id": "tsk_abc123",
"status": "pending",
"model": "wan-2.7-text-to-video"
}
三步在 Hermes Agent 中使用 Wan
Configure RunAPI
Set RUNAPI_API_KEY in the environment where Hermes Agent runs. If you already added RunAPI as a custom:runapi provider, the same key and base_url handle all Wan endpoints — no additional setup needed.
export RUNAPI_API_KEY=runapi_xxx
Call a Wan endpoint
Send a POST request to text_to_video with model set to wan-2.7-text-to-video and output_resolution to 720p or 1080p. For image-to-video, use wan-2.7-image-to-video with a first_frame_image_url. For speech-driven video, use wan-2.2-a14b-speech-to-video-turbo with source_audio_url and source_image_url. Hermes Agent routes all requests through the custom:runapi provider.
POST /api/v1/task/text_to_video
Poll for the result
The endpoint returns a task_id immediately. Poll the task status endpoint until the status is completed, then read the output video or image URL from the response. RunAPI SDKs and the CLI handle polling automatically.
GET /api/v1/task/text_to_video/tsk_abc123
Wan text_to_video API 参数
| 参数 | 类型 | 说明 |
|---|---|---|
model |
string |
Required. wan-2.7-text-to-video, wan-2.6-text-to-video, wan-2.5-text-to-video, wan-2.2-a14b-text-to-video-turbo, or wan-2.7-r2v. |
prompt |
string |
Required. Text description of the desired video scene, including camera motion, lighting, and subject detail. |
output_resolution |
string |
Optional. 720p or 1080p for Wan 2.5+. Wan 2.2 also accepts 480p and 580p. Defaults to 720p. |
aspect_ratio |
string |
Optional. For wan-2.7-r2v only. Accepted values: 16:9, 9:16, 1:1, 4:3, 3:4. |
duration_seconds |
integer |
Optional. For wan-2.7-r2v only. Video length in seconds, 2 to 10. |
seed |
integer |
Optional. Reproducibility seed for deterministic output. |
callback_url |
string |
Optional. Webhook URL that receives a POST when the task completes. |
Hermes Agent 上的 Wan 是什么?
Wan 是阿里巴巴推出的 Apache 2.0 开源视频模型,位列 Artificial Analysis 排行榜榜首,以帧级控制精度、角色一致性和原生口型同步著称。通过 Hermes Agent,它提供最高 1080p 的文生视频和图生视频,全部 20+ 变体可通过单一 provider 配置访问。
Wan 使用场景
规模化品牌内容
利用 Wan 的角色一致性大批量生产品牌视频内容。Hermes Agent 可跨不同产品线并行分发生成任务。
带口型同步的对话内容
在单次 Hermes Agent 工作流中串联 ElevenLabs TTS 与 Wan 的语音转视频端点,从脚本文字直接生成口型同步说话视频,无需人工干预。
影视与广告机构预可视化
生成带端点锚定关键帧的制作级预可视化短片。设置首末帧图像来控制精确的场景转场,供客户审阅。
Wan + Hermes Agent 常见问题
All of them. text_to_video, image_to_video, speech_to_video, text_to_image (Wan 2.7 Image), edit_video, and animate. Configure RunAPI as a custom:runapi provider once, then switch endpoints and model slugs per request — for example wan-2.7-text-to-video for video and wan-2.7-image for image generation up to 4K.
Add a custom:runapi provider entry with base_url set to https://runapi.ai/v1 and your RUNAPI_API_KEY as the API key. Once configured, every Wan endpoint — and all 113+ RunAPI models — is accessible through the same provider without additional plugins.
Wan 2.5 introduced 1080p output. Wan 2.6 added video editing (R2V) and flash variants for faster generation. Wan 2.7 adds image generation (wan-2.7-image, wan-2.7-image-pro up to 4K), video editing (wan-2.7-edit-video), and improved text-to-video quality that leads the Artificial Analysis leaderboard.
Costs vary by variant and resolution. A 720p text-to-video clip with Wan 2.7 runs about 25-35 cents per generation. 1080p costs more. Speech-to-video is priced per generation regardless of length. Check the RunAPI pricing page for exact per-model rates -- credits on RunAPI do not expire.
Yes. Hermes Agent can chain ElevenLabs TTS to generate speech audio, then pass the audio URL to Wan's speech-to-video endpoint, creating a complete text-to-spoken-video pipeline in one workflow.
立即在 Hermes Agent 中试用 Wan。
免费获取 RunAPI 密钥,配置 custom:runapi provider,使用排名第一的开源模型生成视频 —— 文生视频、图生视频或语音生视频。