Fei-Fei Li, a luminary in the field of artificial intelligence (AI) and an influential figure known as the “godmother of AI,” has embarked on an ambitious new venture that aims to redefine how AI interacts with and understands the three-dimensional physical world. Li, along with her colleagues, has co-founded World Labs, a startup that recently raised $230 million in initial funding, making waves in the tech community for its unique focus on “spatial intelligence.”
The Vision Behind World Labs: Understanding the 3D Physical World
World Labs is set to push the boundaries of AI beyond its current capabilities by developing models that can not only interpret but also reason about the three-dimensional world in a human-like way. While many existing AI models, such as OpenAI’s GPT-3 and DALL-E, have made headlines for generating impressive text and image outputs, World Labs is working on a different challenge: teaching AI to comprehend and navigate the complexities of our spatial reality.
Li and her team are focusing on what they term “spatial intelligence” — the ability for AI to understand the physical world’s structure, geometry, and dynamics. Spatial intelligence could revolutionize fields such as augmented and virtual reality (AR/VR), robotics, and numerous other domains where understanding the 3D world is crucial.
“The images and videos that you have seen so far coming out of generative AI models do not give you enough of the whole sense of how a 3D world is built,” Li said in a recent interview. This gap in current AI capabilities, she argues, limits the broader reasoning abilities of AI systems, often resulting in “hallucinations,” such as rendering hands with the wrong number of fingers or misrepresenting real-world physics.
The Founders and Their Backgrounds
World Labs is not just a solo endeavor; it is the collective brainchild of some of the leading minds in computer vision and AI. Apart from Li, the co-founders include Justin Johnson, Christoph Lassner, and Ben Mildenhall, all of whom have distinguished themselves in the field of AI and computer vision.
- Justin Johnson: A leading computer vision researcher, Johnson has contributed extensively to deep learning and visual reasoning. His work has focused on building models that can understand visual inputs contextually and reason about them in human-like ways.
- Christoph Lassner: Lassner’s research has centered on machine learning, computer graphics, and interactive systems. His expertise in creating models that can understand and generate 3D visualizations plays a crucial role in World Labs’ mission.
- Ben Mildenhall: Known for his groundbreaking work in neural rendering, Mildenhall has pioneered techniques for producing photorealistic images from 3D models. His involvement brings a wealth of knowledge on rendering and understanding the visual complexities of 3D spaces.
The team’s diverse expertise is perfectly aligned with World Labs’ ambitious goal to build “large world models” or LWMs that can simulate and understand 3D environments in unprecedented ways.
The Funding and Investors
The $230 million raised in initial funding reflects the high level of confidence the investment community has in World Labs’ vision and leadership. The funding round was led by venture capital heavyweights Andreessen Horowitz, New Enterprise Associates, and Radical Ventures. Other notable investors included AMD, Ventures, Intel Capital, and Nvidia’s NVentures.
Despite the substantial amount of capital raised, World Labs has chosen not to disclose its valuation at this time. The financial backing from major players in both venture capital and the tech industry underscores the potential impact of the startup’s mission.
What is Spatial Intelligence?
To understand World Labs’ approach, it’s essential to grasp what “spatial intelligence” entails. Spatial intelligence refers to the capability of understanding the three-dimensional aspects of objects and environments, including their shapes, sizes, distances, positions, and the relationships between them.
In humans, spatial intelligence is critical for navigation, movement, and understanding our surroundings. For AI, developing this kind of intelligence means creating models that can simulate and understand the 3D world just like humans do. This includes reasoning about how objects move and interact, predicting trajectories, and understanding complex geometries and structures.
Fei-Fei Li explains that current AI models, even the most sophisticated ones, have limitations when it comes to interpreting the 3D world. While they can generate realistic images or videos, these outputs lack a deeper understanding of physical space, physics, and causality. “The way we understand the structure of the world, imagined or real, will fundamentally be a piece of this AI puzzle,” she said.
Applications of Spatial Intelligence
World Labs’ focus on spatial intelligence opens up vast possibilities across several fields:
- Augmented and Virtual Reality (AR/VR): By enabling AI to understand spatial environments better, AR/VR experiences can become far more immersive and realistic. Imagine virtual environments that respond accurately to real-world physics or AR applications that interact seamlessly with the physical objects around them.
- Robotics: Spatial intelligence is crucial for robotics, particularly in fields like autonomous driving, drones, and delivery robots. These machines need to navigate complex environments, avoid obstacles, and understand the dynamics of the world around them. Enhanced spatial intelligence would allow robots to perform these tasks more accurately and efficiently.
- Healthcare: AI models with spatial intelligence could revolutionize medical imaging by understanding the 3D structure of human anatomy, improving diagnostics, surgical planning, and treatment. They could also aid in designing prosthetics or creating detailed simulations for medical training.
- Urban Planning and Architecture: With spatial intelligence, AI could help in designing and visualizing urban environments, understanding traffic flow, optimizing building designs for safety and sustainability, and even predicting the impact of natural disasters.
- Gaming and Entertainment: AI that understands 3D spaces can create more realistic and engaging game environments, design characters that interact more naturally, and even generate complex narratives that adapt to a player’s actions within a three-dimensional space.
The Technology Behind World Labs
World Labs is developing AI models that its founders refer to as “large world models” or LWMs. These models will leverage the same transformer-based architecture that serves as the foundation for OpenAI’s ChatGPT. However, Li emphasized that while the transformer is a crucial component, it will not be the “be-all and end-all” of their models.
Instead, World Labs plans to incorporate additional elements that can better handle the complexities of 3D data. For instance, understanding 3D spaces requires not only recognizing objects but also comprehending their spatial relationships and interactions. This involves integrating multiple data sources, including synthetic and real-world data, to train models that are capable of reasoning about depth, texture, movement, and physical laws.
The Role of Synthetic and Real-World Data
Li has revealed that World Labs will use a combination of synthetic and real-world data to train its models. Synthetic data, created using simulations or computer-generated imagery, offers the advantage of being easily scalable and customizable. It allows researchers to generate large datasets with specific characteristics or rare events that might be hard to capture in real-world data.
Real-world data, on the other hand, provides authenticity and complexity that synthetic data alone cannot achieve. It captures the nuances, imperfections, and unpredictability of the physical world. By combining both, World Labs aims to create models that not only perform well in controlled environments but also adapt seamlessly to real-world scenarios.
Challenges in Building Spatial Intelligence Models
Building AI models capable of spatial intelligence presents several challenges:
- Data Complexity: Understanding 3D environments requires massive amounts of diverse data. Collecting, processing, and labeling this data accurately is a significant challenge, especially when it comes to maintaining consistency across different types of data.
- Computational Requirements: Training AI models for spatial intelligence is computationally intensive. It requires powerful hardware and advanced algorithms to handle the enormous amounts of data involved in 3D modeling.
- Integration of Multiple Modalities: To achieve spatial intelligence, models need to integrate data from various modalities, such as visual, auditory, tactile, and proprioceptive inputs. This integration is crucial for developing a comprehensive understanding of the 3D world, but it is also incredibly complex.
- Avoiding “Hallucinations”: One of the critical challenges, as Li pointed out, is avoiding the generation of unrealistic outputs or “hallucinations.” Current AI models often struggle with accurately representing objects in a 3D space, which can lead to errors like misinterpreting shapes, sizes, or the number of objects.
- Generalization: Developing models that can generalize from one environment to another is another major challenge. A model trained on a specific dataset may not perform well when exposed to new or varied scenarios. Ensuring robustness and adaptability is a key goal for World Labs.
Fei-Fei Li’s Journey: From Academia to Entrepreneurship
Fei-Fei Li’s journey into AI and entrepreneurship is both inspiring and unconventional. Widely regarded as one of the most influential figures in AI, Li has a track record of pioneering work that has shaped the field. Her most notable contribution is the creation of ImageNet, a large-scale image dataset that was pivotal in the development of computer vision technologies capable of identifying objects accurately for the first time.
Her work at Stanford University, where she is a professor and co-director of the Human-Centered AI Institute, has focused on bridging the gap between AI development and ethical considerations, promoting AI that is fair, inclusive, and beneficial to humanity.
Before founding World Labs, Li led AI at Google Cloud from 2017 to 2018 and served on Twitter’s board of directors. She has also advised policymakers, including at the White House, underscoring her commitment to ensuring that AI is developed responsibly.
Li’s entrepreneurial journey began long before World Labs. As a student at Princeton University, she borrowed money to buy a dry-cleaning business for her parents and spent weekends working there to support her family. This experience gave her firsthand insight into the challenges of running a business and shaped her resilience and determination.
The Future of World Labs
World Labs is still in its early stages, but the potential applications of its technology are vast. The startup’s vision to create AI that truly understands the 3D world could revolutionize multiple industries, from robotics to healthcare to entertainment. By training models that go beyond generating flat, two-dimensional outputs and instead reason about the physical world’s intricacies, World Labs aims to unlock new levels of intelligence in AI systems.
With Fei-Fei Li at the helm, supported by a team of leading researchers and backed by significant financial resources, World Labs is well-positioned to make a substantial impact in the AI landscape. Li’s continued involvement with Stanford’s Human-Centered AI Institute also ensures that the startup’s work will align with ethical and human-centric principles.
As the company grows, it will undoubtedly face challenges, from technical hurdles to ethical considerations. However, with its unique focus on spatial intelligence and a visionary leader who has already transformed the AI field once before, World Labs has the potential to chart a new path in the quest to make AI truly understand our three-dimensional world.
Conclusion
Fei-Fei Li’s World Labs represents the next frontier in artificial intelligence — an AI that can perceive, reason, and interact with the world in three dimensions. By leveraging spatial intelligence, Li and her team aim to build models that go beyond the current capabilities of generative AI, opening up new possibilities in various fields, from robotics to healthcare to AR/VR.
With significant funding and a talented team, World Labs is poised to tackle some of the most complex challenges in AI. As the startup progresses, it will be fascinating to see how its work influences the broader AI community and transforms how machines understand the world around them. The future of AI is here, and World Labs is at the forefront, pioneering a new era of intelligent machines that truly “see” and “understand” the world in all its three-dimensional complexity.