Text-to-Video
Gemini Omni Text-to-Video API
Generate video tasks from text prompts with the public model name gemini-omni/video.
Request shape
{
"model": "gemini-omni/video",
"input": {
"prompt": "A cinematic wide shot of a futuristic transit hub at sunrise",
"mode": "std",
"aspect_ratio": "16:9",
"duration": "5",
"sound": true
}
}Prompt guidance
Describe subject, setting, and motion
Include who or what appears, where the scene happens, and how the camera or subject moves.
Keep duration in mind
A 3 to 15 second clip works best with one clear action instead of a long sequence of unrelated events.
Choose the right aspect ratio
Use 16:9 for web video, 9:16 for shorts, and 1:1 for product or social placements.
Example prompts
A cinematic product reveal, camera slowly pushes in, soft studio lighting, premium glass reflections
Aerial view of a coastal road at sunset, waves moving below, smooth drone motion
Close-up of a coffee cup on a wooden table, steam rising, morning light through a window