Back to OmniVideo API

Text-to-Video

Gemini Omni Text-to-Video API

Generate video tasks from text prompts with the public model name gemini-omni/video.

Request shape

{
  "model": "gemini-omni/video",
  "input": {
    "prompt": "A cinematic wide shot of a futuristic transit hub at sunrise",
    "mode": "std",
    "aspect_ratio": "16:9",
    "duration": "5",
    "sound": true
  }
}

Prompt guidance

Describe subject, setting, and motion

Include who or what appears, where the scene happens, and how the camera or subject moves.

Keep duration in mind

A 3 to 15 second clip works best with one clear action instead of a long sequence of unrelated events.

Choose the right aspect ratio

Use 16:9 for web video, 9:16 for shorts, and 1:1 for product or social placements.

Example prompts

A cinematic product reveal, camera slowly pushes in, soft studio lighting, premium glass reflections

Aerial view of a coastal road at sunset, waves moving below, smooth drone motion

Close-up of a coffee cup on a wooden table, steam rising, morning light through a window

Related docs

API Reference Image-to-Video Code Examples