Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Google DeepMind introduces a new video model to rival Sora


Google DeepMind, Google’s flagship artificial intelligence research lab, wants to beat OpenAI at the video generation game — and it might, at least for a while.

On Monday, DeepMind announced its next-generation video-generating artificial intelligence and its successor, Veo 2. i seewhich powers a increasing number of products in the Google portfolio. The Veo 2 can create additional two-minute clips at 4k (4096 x 2160 pixels) resolution.

Note that this is 4x the resolution of OpenAIs – and over 6x the duration Sora can achieve.

This is a theoretical advantage for now. VideoFX, Google’s experimental video creation tool, for which the Veo 2 is now exclusively available, is limited to 720p and eight seconds in length. (Sora can produce 1080p, 20-second clips.)

Google VideoFX
Veo 2 in VideoFX.Image credits:Google

VideoFX is behind a waiting list, but Google says it’s increasing the number of users who can access it this week.

Eli Collins, vice president of product at DeepMind, also told TechCrunch that Google will make Veo 2 available through itself. Vertex AI developer platform “once the model is ready for use at scale.”

“In the coming months, we’ll continue to iterate based on user feedback,” Collins said, “and (we’ll) look to integrate Veo 2’s updated capabilities into compelling use cases across the Google ecosystem… (We look forward) to sharing more innovations next year.”

More controlled

Like Veo, Veo 2 can create videos given a text prompt (such as “Highway racing car”) or text and a reference image.

So what’s new in the Veo 2? Well, DeepMind says that the model, which can create different styles of clips, has an improved “understanding” of physics and camera controls and produces “clearer” images.

More specifically, DeepMind means that textures and images in clips are sharper – especially in scenes with a lot of movement. As for the improved camera controls, they allow the Veo 2 to more precisely position the virtual “camera” in the videos it creates and move that camera around to capture objects and people from different angles.

DeepMind also claims that Veo 2 can more realistically model motion, fluid dynamics (like pouring coffee into a cup), and light properties (like shadows and reflections). DeepMind says this includes different lenses and cinematic effects, as well as “nuanced” human expression.

Google I see 2
Google Veo 2 example. Note that compression artifacts were introduced in converting the clip to GIF. Image credits:Google

DeepMind shared a few cherry-picked samples from Veo 2 with TechCrunch last week. For AI-generated videos, they looked pretty good — pretty good, even. Veo 2 has a strong grasp of fracture and difficult liquids like maple syrup and the ability to simulate Pixar-style animation.

But despite DeepMind’s insistence, the model is less likely hallucinations elements such as extra fingers or “unexpected objects”, the Veo 2 cannot completely clean the unusual gorge.

Notice the lifeless eyes on the dog-like creature in this cartoon:

Google I see 2
Image credits:Google

And the weird slippery road in these shots — plus pedestrians intermingling in the background and buildings with physically impossible facades:

Google I see 2
Image credits:Google

Collins acknowledged that there is work to be done.

“Compatibility and consistency are areas for growth,” he said. “Veo can consistently follow an instruction for a few minutes, but (it cannot) follow complex instructions over long time horizons. Similarly, character consistency can be a problem. There is also room for improvement to create intricate detail, fast and complex motion, and continue to push the boundaries of realism.”

DeepMind continues to work with artists and producers to improve video creation models and tools, Collins added.

“From the beginning of our Veo development, we’ve worked with creators like Donald Glover, The Weeknd, d4vd and others to truly understand their creative process and how technology can help bring their vision to life,” said Collins. “Our work with creators on Veo 1 informed the development of Veo 2, and we look forward to working with trusted testers and creators to get feedback on this new model.”

Safety and training

Veo 2 is trained on many videos. Artificial intelligence models generally work like this: After some data form patterns, the pattern-provided models take patterns that allow them to create new patterns in the data.

DeepMind won’t say exactly where it scraped videos to train Veo 2, but YouTube is one possible source; Google owns YouTube and DeepMind before Google models like Veo can be trained on some YouTube content, he told TechCrunch.

“Veo is trained in high-quality video-image pairings,” Collins said. “Video description pairs are a related description of a video and what happened in that video.”

Google I see 2
Image credits:Google

While DeepMind has tools through Google to allow webmasters to block the lab’s bots from extracting training data from their websites, DeepMind does not offer a mechanism to allow creators to remove jobs from existing training sets. The lab and its parent company maintain that it has training models that use public data fair usemeans that DeepMind believes that asking permission from data owners is not mandatory.

Not all creatives agree – especially in light is studying predicts that tens of thousands of film and television jobs could be disrupted by artificial intelligence in the coming years. Several AI companies, including the eponymous startup behind the popular AI app Midjourney cross marks of claims accuses them of infringing artists’ rights by training content without their consent.

“We are committed to collaborating with creators and our partners to achieve common goals,” said Collins. “We continue to work with, collect ideas and listen to feedback from people in the creative community and wider industry, including those who use VideoFX.”

Due to the way today’s generative models behave during training, they carry certain risks, such as regurgitation, which involves the model creating a mirror copy of the training data. DeepMind’s solution is operational-level filters, including violence, graphics and explicit content.

Google’s compensation policyCollins said the Veo 2, which provides protection against claims of copyright infringement stemming from the use of its products for certain customers, will not apply until it is generally available.

Google I see 2
Image credits:Google

DeepMind said it used its proprietary watermarking technology, SynthID, to place invisible markers on the frames generated by Veo 2 to reduce the risk of deep forgery. However, like all watermarking technology, SynthID not flawless.

Image improvements

In addition to Veo 2, Google announced updates to DeepMind this morning Figure 3its commercial image-making model.

The new version of Imagen 3 is available to users of ImageFX, Google’s image creation tool, starting Monday. It can create “brighter, better-composed” images and photos in styles like photorealism, impressionism and anime, according to DeepMind.

“This upgrade (to Imagen 3) also follows cues more faithfully and provides richer details and textures,” DeepMind said in a blog post to TechCrunch.

Google ImageFX
Image credits:Google

Along with the model, UI updates for ImageFX are also rolling out. Now when users type in prompts, the key terms in those queries will turn into “chiplets” with a drop-down menu of suggested related words. Users can use chips to repeat what they’ve typed or choose one of the auto-generated descriptors below the prompt.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *