| Best default use |
Reference-heavy ads, creator workflows, product shots, and multi-asset creative direction. |
Cinematic social clips, dialogue scenes, storyboard-style control, and longer narrative sequences. |
High-fidelity short clips, polished hero shots, image-to-video, and Google-aligned API workflows. |
| Input contract |
Text plus first/last frames, image references, video references, audio references, and broad aspect-ratio control. |
Text, first/last frame control, reference elements, and prompt-driven scene direction. |
Text, image-to-video, reference images, and first/last-frame workflows. |
| Reference budget |
Best when one request may carry several images, video refs, and audio refs; use it when uploaded assets are the product. |
Best when references guide scene direction, not when the request needs a large asset bundle. |
Best when reference images or first/last frames are enough; less suited to heavy multi-asset briefs. |
| Duration fit |
4-15 seconds; useful when one generated unit needs enough time for an ad beat. |
3-15 seconds; useful when a clip needs pacing, action, or dialogue continuity. |
4, 6, or 8 seconds; useful for short, high-polish clips and visual inserts. |
| Audio behavior |
Best treated as a multimodal reference workflow when audio cues are part of the brief. |
Strong fit for native audio, multilingual dialogue, and scene rhythm. |
Strong fit for native audio in short Google video workflows. |
| Resolution path |
480p, 720p, 1080p; fit depends on reference assets and output target. |
720p, 1080p, 4K; good when output spec matters for social or cinematic delivery. |
720p, 1080p, 4K; good when high-fidelity short output is the product requirement. |
| Request strategy |
Route by asset type: text-only, first-frame, first/last-frame, or multi-reference. |
Route by scene need: no-sound social clip, sound-enabled clip, or motion-control style workflow. |
Route by mode and cost: text, first/last frames, reference mode, quality, fast, upscale, or extension. |
| Latency and retries |
Retry logic should watch reference validation failures and asset URL availability. |
Retry logic should watch audio-enabled cost, long-duration failures, and prompt drift. |
Retry logic should watch preview-only controls, safety blocks, and short-clip re-generation cost. |
| Developer workflow |
Use when your app accepts user-uploaded assets and needs schema fields for references. |
Use when your app exposes scene direction, audio options, or longer clip choices. |
Use when your app already aligns with Google model behavior or short-form image-to-video. |
| Main risk |
Reference-heavy workflows can create more validation, storage, and retry edge cases. |
Narrative control can still vary by prompt; plan fallback for dialogue or action failures. |
Short duration can be limiting when the product needs longer scene continuity. |
| Poor fit when |
You only need a simple short text-to-video hero clip with minimal references. |
You do not need audio, dialogue, pacing, or sequence control. |
You need 15-second continuity or heavy multi-reference creative control. |