New Google footage teases future of AI-generated movies with DeepMind tool that even creates soundtracks for videos

June 19, 20245 Mins Read

THE next generation of AI-made videos is about to go public as Google announces a new tool that can auto-create unique soundtracks.

Several AI-generated video makers have impressed users for years, like OpenAI‘s Sora, Runway Gen-3 Alpha, and Luma AI’s Dream Machine.

Google announced the new video-to-audio tool for its DeepMind AI generator on MondayCredit: AP

The V2A tool will produce music that works with the characters’ dialogue and other tonal elements to nail the right auditory atmosphereCredit: Google

DeepMind’s V2A can generate a limitless number of soundtrack ideas tooCredit: Google

But none of these magic makers could generate a decent soundtrack to go along with videos — until now.

Google announced the new video-to-audio tool for its DeepMind AI generator on Monday.

“Video generation models are advancing at an incredible pace, but many current systems can only generate silent output. One of the next major steps toward bringing generated movies to life is creating soundtracks for these silent videos,” Google wrote.

“Today, we’re sharing progress on our video-to-audio (V2A) technology, which makes synchronized audiovisual generation possible.”

“V2A combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action,” they explained.

The tool can be paired with video generation models like Veo to craft dramatic soundtracks that align perfectly with any scene.

The AI will produce music that works with the characters’ dialogue and other tonal elements to nail the right auditory atmosphere.

“It can also generate soundtracks for a range of traditional footage, including archival material, silent films and more — opening a wider range of creative opportunities,” DeepMind said.

Google shared impressive examples of the new tech in action, including clips of a Western-style soundtrack that accompanied a cowboy on a horse and a wild wolf howling at the moon.

COMPLETE CREATIVE CONTROL

Google’s new V2A tool will give creators the power to allow the AI to generate a soundtrack based on the visual input and language prompts of the clip, or to design a soundtrack themselves.

‘Oh god, this should not exist’ cry viewers as ‘insane’ AI-made video revealed – can you see signs the man isn’t real?

Users can give prompts and editing pointers to the tool to guide its output in the desired direction.

One set of directions read: “Prompt for audio: Cinematic, thriller, horror film, music, tension, ambiance, footsteps on concrete.”

The scene shows a man walking through a destroyed building before ending with a view of the same man on an eerie bridge.

The AI creates a well-suited soundtrack for the clip that matches the tone and pace of the narrative.

ENDLESS SOUNDTRACK OPTIONS

DeepMind’s V2A can generate a limitless number of soundtrack ideas too.

One example prompt read: “Prompt for audio: A spaceship hurtles through the vastness of space, stars streaking past it, high speed, Sci-fi.”

The video showed a spacecraft soaring through the vast openness of space with the light of a star shining in the distance.

The first soundtrack generated by the V2A tool was an uplifting, orchestral piece that matched the image and prompt.

A second soundtrack produced by the AI from the same prompt was darker and slower.

What is Google DeepMind?

Google’s DeepMind project was born in 2010.

“Google DeepMind brings together two of the world’s leading AI labs — Google Brain and DeepMind — into a single, focused team led by our CEO Demis Hassabis,” according to Google.

“Over the last decade, the two teams were responsible for some of the biggest research breakthroughs in AI, many of which underpin the flourishing AI industry we see today.”

The organization aims to bring out the enormous potential of AI for everyone.

“We’re a team of scientists, engineers, ethicists and more, working to build the next generation of AI systems safely and responsibly,” they wrote.

“By solving some of the hardest scientific and engineering challenges of our time, we’re working to create breakthrough technologies that could advance science, transform work, serve diverse communities — and improve billions of people’s lives.”

SOURCE: GOOGLE DEEPMIND

Using “Prompt for audio: Ethereal cello atmosphere,” changed things up even more.

This third soundtrack immediately set a sadder and more pensive tone.

ONLY GETTING BETTER

Google said these updates were just its latest attempt to upgrade its full suite of AI-generated content providers.

They hope to improve upon some issues in upcoming versions.

“Since the quality of the audio output is dependent on the quality of the video input, artifacts or distortions in the video, which are outside the model’s training distribution, can lead to a noticeable drop in audio quality,” Google said.

“We’re also improving lip synchronization for videos that involve speech. V2A attempts to generate speech from the input transcripts and synchronize it with characters’ lip movements.”

“But the paired video generation model may not be conditioned on transcripts. This creates a mismatch, often resulting in uncanny lip-syncing, as the video model doesn’t generate mouth movements that match the transcript,” they added.

Source link