Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

What are AI “world models” and why are they important?


World models, known as world simulators, are being touted by some as the next big thing in artificial intelligence.

Artificial intelligence pioneer Fei-Fei Li’s World Labs raised $230 million to build “big world models” and DeepMind was hired One of the creators of OpenAI’s video generator, Soraworking on “world simulators”. (Sora was released on Monday; here are some initial impressions.)

But what the heck there is these things?

World models are inspired by mental models of the world that humans naturally develop. Our brains take abstract representations from our senses and shape them into a more concrete understanding of the world around us, producing what we call “models” long before we adopted the term AI. The predictions our brain makes based on these models affect how we perceive the world.

A paper An example of a baseball bat by artificial intelligence researchers David Ha and Jürgen Schmidhuber. Hitters have milliseconds to decide how to swing their bat—shorter than it takes for visual signals to reach the brain. Ha and Schmidhuber say the reason they can hit 100 mph fastballs is because they can instinctively predict where the ball is going to go.

“For professional players, this all happens subconsciously,” the research duo wrote. “Their muscles reflexively swing the bat at the right time and place according to the predictions of their internal model. They can act on their predictions about the future without having to consciously act out possible future scenarios to create a plan.”

It is these subconscious reasoning aspects of world models that some believe are prerequisites for human-level intelligence.

Modeling the world

Although the concept has been around for decades, world models have recently gained popularity in part due to their promising applications in the field of generative video.

Most, if not all, AI-generated videos fall into uncanny valley territory. Watch them long enough and something strange it will happen as the limbs are folded and joined together.

While a generative model trained on years of video can accurately predict the bounce of a basketball, it actually has no idea why—just as language models don’t really understand the concepts behind words and phrases. But a world model of why basketball is making such a splash will show him how to do it better.

To provide such understanding, world models are trained on a range of data, including photos, audio, videos, and text, with the ability to create internal representations of how the world works and reason about the consequences of actions. .

Runway Gen-3
An example of AI startup Runway’s Gen-3 video generation model. Image credits:Runway

“The viewer expects the world they’re watching to behave in the same way as their reality,” Alex Mashrabov, former head of artificial intelligence and CEO of Snap Higgsfieldsaid he builds generative models for video. “If a feather falls to the ground with the weight of an anvil, or a bowling ball soars hundreds of feet into the air, it creates confusion and takes the viewer out of the moment. With a robust world model, instead of the creator determining how each object should move—which is tedious, difficult, and a poor use of time—the model will figure it out.”

But better video generation is only the tip of the iceberg for world models. The researchers, including Yann LeCun, Meta’s chief AI scientist, say the models could one day be used for sophisticated forecasting and planning in both the digital and physical realms.

a to talk earlier this year, LeCun described how a world model can help achieve any goal through reasoning. Given a goal (a clean room), a model with a basic representation of the “world” (eg, a video of a dirty room) can perform a series of actions to achieve that goal (sweep, place vacuums to clean). dishes, empty the trash) not because this is a pattern he observes, but because he knows more deeply how to go from dirty to clean.

“We need machines that understand the world; (Machines) can remember things, have intuition, have common sense — things that can think and plan at the same level as humans,” he said. “Despite what you hear from some of the most enthusiastic people, current AI systems are not capable of any of that.”

Although LeCun estimates that we are at least a decade away from the world models he envisioned, today’s world models hold promise as simulators of elementary physics.

OpenAI Minecraft's sister
Sora controls one player in Minecraft – and shows the world. Image credits:OpenAI

OpenAI notes in its blog that Sora, which it considers the world’s model, can simulate movements like an artist leaving brushstrokes on a canvas. Models like Sora and Sora himself – can also be effective to simulate video games. For example, Sora can display a Minecraft-like UI and game world.

World Labs co-founder Justin Johnson said future world models could create 3D worlds for on-demand games, virtual photography and more. episode From the a16z podcast.

“We already have the ability to create virtual, interactive worlds, but it costs hundreds and hundreds of millions of dollars and a ton of development time,” Johnson said. “(World models) will allow you to not just get a picture or a clip, but a fully simulated, live and interactive 3D world.”

High hurdles

Although the concept is attractive, many technical problems stand in the way.

Training and running world models requires enormous computing power, even compared to the amount currently used by generative models. While some of the latest language models can run on a modern smartphone, Sora (probably an early world model) would require thousands of GPUs to develop and run, especially if their use becomes commonplace.

World models, like all AI models hallucinations – and internalize biases in the training data. A world model trained on videos of mostly sunny weather in European cities, for example, might struggle to understand or describe Korean cities in snowy conditions, or simply get it wrong.

A general lack of training data threatens to exacerbate these problems, Mashrabov says.

“We found that the patterns were really limited to generations of people of a particular species or race,” he said. “The training data for a world model needs to be broad enough to cover a variety of scenarios, but also highly specific to where the AI ​​can deeply understand the nuances of those scenarios.”

Recently postData and engineering challenges prevent today’s models from accurately capturing the behavior of the world’s inhabitants (such as humans and animals), says Cristobal Valenzuela, CEO of AI startup Runway. “Models will need to create coherent maps of the environment,” he said, “and the ability to navigate and interact within those environments.”

OpenAI Sora
Video created by Sora. Image credits:OpenAI

However, if all the major hurdles are overcome, Mashrabov believes that world models can build a “stronger” bridge between artificial intelligence and the real world, which will lead to advances not only in virtual world creation, but also in robotics and AI decision-making. leads to

They can also create more capable robots.

Robots today are limited in what they can do because they have no awareness of the world around them (or their own bodies). According to Mashrabov, world models can give them this information – at least up to a point.

“With an advanced world model, AI can develop a personal understanding of whatever scenario it’s placed in,” he said, “and start thinking about possible solutions.”

TechCrunch has an AI-powered newsletter! Register here to receive in your inbox every Wednesday.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *