Filmmakers get a powerful ally with Google's Veo AI video generator

The race to develop cutting-edge generative AI models for video content is heating up. Just three months after OpenAI‘s captivating demonstration of Sora, its text-to-video AI system, Google is stepping into the ring with Veo, its latest generative AI video model, unveiled at the company’s I/O developer conference on Tuesday.

According to Google, Veo boasts an “advanced understanding of natural language,” enabling it to comprehend and interpret cinematic terms such as “timelapse” or “aerial shots of a landscape.” The model promises to generate “high-quality” 1080p resolution videos exceeding a minute in length, spanning a diverse array of visual and cinematic styles.

Users can direct Veo’s output using text, image, or video-based prompts, and Google claims the resulting videos exhibit greater consistency, coherence, and realistic movement for people, animals, and objects throughout the shots.

In a press preview on Monday, Demis Hassabis, CEO of Google DeepMind, revealed that Veo’s video results could be refined using additional prompts. Google is also exploring features that would enable Veo to produce storyboards and longer scenes, further enhancing its utility for filmmakers.

Unlike previous AI model unveilings, Google is taking a more measured approach with Veo. The company is initially inviting select filmmakers and creators to experiment with the model, aiming to determine how it can best support creatives. Google plans to build on these collaborations to ensure “creators have a voice” in shaping the development of its AI technologies.

In the coming weeks, some Veo features will be made available to “select creators” in a private preview within VideoFX, Google’s video editing tool. Users can sign up for the waitlist to get an early chance to try out the model. Additionally, Google has indicated plans to incorporate some of Veo’s capabilities into YouTube Shorts “in the future.”

Veo is not Google’s first foray into video generation models. The company has previously developed models like Phenaki and Imagen Video, which produced crude and often distorted video clips. More recently, Google showcased the Lumiere model in January, which impressed with its ability to understand video content, simulate real-world physics, and render high-definition outputs.

However, Google claims that Veo surpasses Lumiere’s capabilities, setting a new standard for understanding video content, simulating real-world physics, and rendering high-definition outputs.

While Google is pitching Veo as a tool for filmmakers, OpenAI may have a head start in this arena. The company has already been courting Hollywood with Sora and plans to release it to the public later this year, having teased a potential release “in a few months” back in March.

OpenAI is also exploring the integration of audio into Sora and may make the model available directly within video editing applications like Adobe’s Premiere Pro. Given Sora’s potential advantages, Google’s Veo may face stiff competition in vying for the attention of filmmakers and content creators.