Fei-Fei Li's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Frontier Models · Published: Jul 03, 2025 · Priya Sharma · ~22 min read

Author Info

Enterprise AI & Governance Editor

JD (technology policy focus); CIPP/US; former in-house counsel at a cloud provider

Priya writes about regulation, enterprise procurement, and responsible deployment. She separates legal fact from commentary, flags jurisdictional limits, and works with external counsel on high-risk governance topics. Her articles emphasize what changed, who is accountable, and what practitioners should verify locally.

#AI Regulation #Enterprise Adoption #Risk & Compliance #Policy Analysis

Full author profile →

In my view, artificial general intelligence (AGI) is incomplete without spatial intelligence.

This was Li Fei-Fei’s assessment of AGI in her latest interview—yes, even the “Godmother of AI” is now discussing AGI.

However, she frames it differently. Since entering the field of artificial intelligence, she has defined her lifelong mission: to enable agents to tell the story of the world.

And this cannot be achieved without spatial intelligence.

Li Fei-Fei's latest interview: Without spatial intelligence, AGI is incomplete

As she herself stated:

I have spent my entire career chasing extremely difficult, almost crazy questions.

Li Fei-Fei now focuses on spatial intelligence—one of the most challenging areas in artificial intelligence.

She believes that 3D world modeling is crucial for achieving AGI and stated:

Understanding the 3D world, generating it, reasoning within it, and acting within it are fundamental problems for artificial intelligence.

Her goal is to create a world model that transcends flat pixels and language barriers, capable of truly capturing the structure of the three-dimensional world and spatial intelligence.

Li Fei-Fei's latest interview: Without spatial intelligence, AGI is incomplete

In this conversation, she traced the origins and impact of ImageNet, discussed paradigm shifts and key breakthroughs in AI, and addressed the challenges of 3D modeling as well as the lack of data for spatial intelligence.

This website has translated and summarized the full article. Let us learn from Li Fei-Fei’s latest insights and sharing.

ImageNet Built the Data Skeleton for Modern Computer Vision

Q: One of your earliest projects was ImageNet, created in 2009, which is now 16 years ago. That paper has been cited over 80,000 times and truly touched upon a critical issue in artificial intelligence: the data problem. Please tell us how that project came about—at the time, it was groundbreaking work.

Li Fei-Fei: Actually, we conceived of this (ImageNet) nearly 18 years ago. I was an assistant professor at Princeton University then. The world of AI and machine learning was completely different; data was very scarce. At least in computer vision, algorithms didn’t really work—there was no industry there. You know, to the general public, the term “AI” did not even exist.

Li Fei-Fei's latest interview: Without spatial intelligence, AGI is incomplete

But among us—starting with the founders of AI—and then people like John McCarthy and Geoffrey Hinton—I think we were just dreaming an artificial intelligence dream: we really, truly wanted machines to be able to think and act. My personal dream was to enable machines to see, because seeing is the cornerstone of intelligence.

Visual intelligence is not merely perception; its true meaning lies in understanding the world and acting within it. I became obsessed with the problem of enabling machines to see. When I was passionately developing machine learning algorithms back then, we tried neural networks, but they failed, so we turned to foundational networks like support vector machines.

But one issue always troubled me: generalization. If you work in machine learning, you must recognize that generalization is the core mathematical foundation or goal of machine learning. For generalization, these algorithms need data. But at that time, no one had data in computer vision. I was among the first graduate students to start working with data because I belonged to the generation that saw the early massive development of the internet and the Internet of Things.

Fast forward to the 21st century, around 2007, my students and I decided we had to make a bold bet: we had to bet that machine learning needed a paradigm shift, and this shift must be led by a data-driven approach, but there was simply no data at the time.

So we thought, well, let’s download a billion images from the internet—that was the largest amount we could access then—and create an entire visual taxonomy of the world to train and evaluate machine learning algorithms. That is how ImageNet was conceived and born.

The Fusion of Natural Language and Visual Signals Enables Agents to Tell the Story of the World

Q: It took some time for this process to develop promising algorithms until AlexNet emerged in 2012, which constituted the second key part toward artificial intelligence: gaining computing power and investing sufficient resources into algorithms. Please tell us when you began to realize this—specifically, that moment when you discovered that the method of “sowing data” started working, and the entire AI community achieved further breakthroughs based on it.

Li Fei-Fei: In 2009, we published a very small poster at CVPR. Between 2009 and 2012, those three years, we truly believed that data would drive artificial intelligence, but we had almost no signals as to whether it was effective. So we did some things; one of them was open-sourcing. From the beginning, we believed we must open-source this project to the entire research community so everyone could participate.

Another thing was that we launched a challenge, hoping that the smartest and best students and researchers from around the world would come together to solve it. This is what we called the ImageNet Challenge. We released a test dataset every year and publicly invited everyone to participate. In the first few years, we were essentially establishing baselines—the recognition error rate hovered around 30%, which was not quite random guessing levels, but certainly unsatisfactory.

Li Fei-Fei's latest interview: Without spatial intelligence, AGI is incomplete

However, in the third year, 2012 (I wrote about this in a book I published), I still remember it was near the end of summer. We were processing all the results from the ImageNet Challenge and running them on our servers. Then one late night, I received a message from my graduate student saying we had obtained an exceptionally outstanding result and that I should take a look. So we examined it closely; it was something like convolutional neural networks.

At that time, Geoffrey Hinton’s team did not yet use the name “Alex”; they called it “SuperVision,” which is a clever pun on “super vision” and “supervised learning.” So, “SuperVision.”

Let’s look at what they did—it was an old algorithm; convolutional neural networks had emerged in the 1980s, but they made some adjustments to the algorithm. Seeing such a leap initially surprised us. You know, we presented this at the ICCV challenge workshop in Florence, Italy, that year. Alex Krizhevsky and many researchers attended.

Li Fei-Fei's latest interview: Without spatial intelligence, AGI is incomplete

This moment has now been recorded in history as the “AlexNet Moment of the ImageNet Challenge.” It was not merely an application of a convolutional neural network; it was Alex and his team’s first feat of using two GPUs in parallel for deep learning computation. So this was actually the first time data, GPUs, and neural networks were combined.

Q: Now, following the trend of computer vision intelligence development, ImageNet truly became key to solving object recognition concepts, and subsequently, artificial intelligence reached a level where it could parse visual scenes. Because you and your…

Students, such as Andrej Karpathy, did significant work that enabled AI to achieve scene description capabilities for the first time. Please explain how this transition from objects to scenes occurred.

Fei-Fei Li: The core problem solved by ImageNet was: when a system receives an image, it can accurately identify the objects within it, such as “there is a cat here” or “that is a chair,” and so on. This is a fundamental issue in visual recognition.

Since I entered the field of artificial intelligence as a graduate student, I have had a dream—a dream that spans a century—which is to enable agents to tell stories about the world: when you open your eyes in this room, you do not just see people, chairs, and more chairs; you actually perceive an entire conference room with screens, a stage, people, an audience, and cameras… You can describe the whole scene you see. This is a foundational capability of human visual intelligence and is crucial to our daily lives.

So I truly believed this problem would haunt me for my entire life, literally. When I graduated as a graduate student, I told myself that if I could create an algorithm capable of telling stories about scenes before I died, then I would have succeeded.

Fei-Fei Li's Latest Interview: Without Spatial Intelligence, AGI Is Incomplete

With the arrival of the “AlexNet moment,” deep learning experienced explosive growth. When Andrej and later Justin Johnson joined my lab, we began to observe signs of natural language and visual signals merging. Subsequently, Andrej and I proposed the problem of captioning images or telling stories about them. In short, around 2015, Andrej and I published a series of papers, alongside several similar studies on creating computers that could caption images. At that time, my feeling was, “Oh my god, how will I spend the rest of my life?” It was my lifelong goal, and it was an incredible moment for both of us.

Last year, during a TED talk, I actually used a tweet Andrej had posted years ago when he completed his work on image captioning; it essentially summarized his paper. I also joked with him, saying, “Hey Andrej, why don’t we do the reverse? Take a sentence and generate an image.” Of course, he knew I was joking and replied, “Haha, I’m out~ The world isn’t ready yet.” Fast forward to today, and we all know about generative AI; now we can generate beautiful images from a single sentence. So the moral of this story is that AI has achieved incredible growth.

I personally believe I am the luckiest person in the world because my entire career began at the early stages after the end of the AI winter—the starting point where AI began to take off—and a significant portion of my own work and career contributed to or supported this transformation. Therefore, I feel very fortunate and proud.

Without Spatial Intelligence, General Intelligence Is Incomplete

Q: I think the craziest thing is that even if you achieved your lifelong dream—describing scenes and even generating them using diffusion models—you are still dreaming bigger because the entire development history of computer vision has moved from objects to scenes, and now the concept is the whole world. You decided to transition from academia as a professor to becoming the founder and CEO of World Labs. Please tell us, what is more difficult than seeing objects?

Fei-Fei Li: It is really hard to summarize the experiences of the past five or six years. We are living in such a civilizational moment of technological progress, right? As computer vision scientists, we have witnessed the astonishing development of computer vision from image recognition to image description and then to image generation using diffusion technology, all happening in a very exciting way.

Fei-Fei Li's Latest Interview: Without Spatial Intelligence, AGI Is Incomplete

We have another equally exciting field, which is language, specifically Large Language Models (LLMs). In November 2022, ChatGPT opened the door to true generative models that could pass the Turing test. For me, this progress was very inspiring; even for someone of my age, it encourages bold thinking about what the future holds.

As a computer vision scientist, I have a habit—many of my inspirations actually come from evolutionary theory and brain science. Throughout my career, I have often looked for the next target problem to solve by asking myself: What did evolution do? What did brain development do? One thing that is very noteworthy or appreciable is that, broadly speaking, the evolutionary development of human language took approximately 300 million to 500 million years, less than a billion years. Essentially, humans are the only animals with complex language. We can debate whether animals possess language, but in terms of language as a tool for communication, reasoning, and abstraction, it is truly humans who possess it, and this took less than 500 million years.

But consider vision—the ability to understand the three-dimensional world, figure out what to do in this 3D world, navigate it, interact with it, understand it, and communicate about it… This evolution lasted for 540 million years. 540 million years ago, the first trilobites evolved visual perception underwater, and it was precisely the emergence of vision that ignited this evolutionary arms race. For the five hundred million years before vision emerged, life on Earth was extremely simple. But in the subsequent 500 million years—specifically within these 540 million years—because organisms gained the ability to observe and understand the world, the evolutionary race officially began, and animal intelligence started competing with one another.

Fei-Fei Li's Latest Interview: Without Spatial Intelligence, AGI Is Incomplete

So for me, solving the problem of spatial intelligence—how to understand, generate, and reason about the 3D world, how to act within it—is a fundamental issue in artificial intelligence. In my view, without spatial intelligence, Artificial General Intelligence (AGI) is incomplete. I want to solve this problem: it involves creative world models that go beyond flat pixels and language, truly capturing the structure of the 3D world and spatial intelligence.

The luckiest thing in my life is that no matter how old I get, I always get to work with the best young people. So, you know, I co-founded a tech company with three amazing young but world-class technologists—Justin Johnson, Ben Mildenhall, and Christoph Lassner—and we are about to attempt to solve what I believe is currently the most difficult problem in the AI field.

Q: That is truly incredible talent. I mean, Chris is the creator of Pulsar, which was a precursor to Gosh and Splats, capable of highly distinguishable rendering; your former student Justin Johnson possesses strong systems engineering thinking, achieving real-time neural style transfer; and then there is Ben, the author of NeRF. So this is an elite super-team, and you need such a team. We briefly discussed earlier that visual tasks are harder than LLMs in some aspects—perhaps this statement is controversial, but after all, LLMs are essentially one-dimensional, whereas you are talking about understanding the structure of the 3D world. Why is this so difficult and lagging behind language research?

Fei-Fei Li: I appreciate that you realize how difficult our problem is, haha. Language is inherently one-dimensional, right? Those syllables are arranged in sequence, which is why sequence-to-sequence and sequence modeling are so classic. There are also aspects of language that people do not realize: language is purely generative. Language does not exist in nature; you cannot touch or see it. Language originates from everyone’s brain, and this is a pure generative signal—of course, if you write it on paper, it exists physically.

But the generation, construction, and utility of language are highly creative, whereas the real world is far more complex. The first

First, the real world is three-dimensional. If we add time, it becomes four-dimensional, but let’s stick to space and assume the world is essentially 3D; this itself presents a significantly higher level of combinatorial complexity.

Second, visual perception and reception of the world are projections. Whether through your eyes, retina, or camera, the process always converts 3D into 2D. You must understand how difficult this is: mathematically speaking, it is an error-prone reduction. This is precisely why humans and animals possess multiple sensors.

Third, the world is not entirely generative. We can generate virtual 3D worlds that still adhere to physical laws, but there is also a real world outside. Currently, virtual worlds are switching between generation and reconstruction in a very fluid manner, while user behavior, utility, and use cases differ vastly. If we fast-forward to this generation, we might discuss games or the metaverse; if we dial back into the real world, we find ourselves talking about embodied intelligence. Yet, all of these exist on a continuum of world modeling and spatial intelligence.

Li Feifei's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

An obvious yet often avoided question is: The internet is flooded with language data, but where is the data for spatial intelligence? Of course, this information exists within the human brain, but it is not as easily accessible as language, which contributes to its difficulty. Honestly, this excites me because if it were easy, someone else would have solved it by now. My entire career has been dedicated to pursuing extremely difficult, almost crazy problems, and I believe this is that kind of crazy problem.

Q: Even thinking from first principles, the number of neurons in the human visual cortex processing data far exceeds those processing language. This architecture differs significantly from Large Language Models (LLMs), a distinction you have gradually recognized, correct?

Li Feifei: That is actually an excellent question. There are still many differing viewpoints today. Much of what we see in large language models is essentially writing—extending stories to perfect endings through writing skills, allowing for almost entirely self-supervised learning via brute force.

In contrast, constructive world models may be more complex. The world is more structured and may require signals to guide it, which can be viewed as a form of prior knowledge or data supervision.

I believe these are some of the open questions we must address. Moreover, we do not even fully understand all human perception; we have not yet solved how 3D operates in human vision. Although mechanically, we use our eyes to perform three-dimensional measurements on objects, where does the mathematical model come into play afterward?

Humans are not as inherently proficient at 3D processing as other animals, so there is still much to be answered. I am simply relying on one thing: I expect some of the brightest minds among us to solve this problem.

Li Feifei's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Q: Can we then say that World Labs is building a brand-new foundational model that outputs 3D worlds? What applications do you envision for it? You have listed everything from perception to generation, but there is always a tension between generative and discriminative models. So, what is the utility of these outputted 3D worlds?

Li Feifei: Regarding spatial intelligence, much like language models, the use cases are vast from a creative perspective. For example, it can serve as a designer, architect, industrial designer, or artist; it spans from creators to game developers, and extends to robotics and robotic learning. The utility of spatial intelligence models, or world models, is immense.

Actually, I am very interested in the metaverse. I know many people still feel it doesn’t work, and I acknowledge that it currently cannot function as intended. However, I believe a convergence of hardware and software is on the horizon, which represents another excellent use case for the future.

Q: Personally, I am thrilled by your efforts to solve metaverse-related issues, as I have tried addressing them at previous companies myself.

Li Feifei: I think hardware is part of the obstacle in the metaverse. The metaverse requires content generation, and content generation necessitates world models.

Fearlessness in Thought

Q: For some viewers, your transition from academia to becoming a founder and CEO might seem sudden. However, you have had an extraordinary life journey; this is not your first time building something from scratch. When you immigrated to the United States as a teenager who didn’t speak English, you even ran a laundry store for several years. How did these experiences shape who you are today?

Li Feifei: I was 19 years old and needed to study physics at Princeton University, so I had no way to support my family. Consequently, I opened a decent dry-cleaning business—in Silicon Valley terms, I started raising capital.

I served as a founder, CEO, and even worked as a cashier. Regardless of the roles, looking at you all makes me incredibly excited because your ages are roughly half mine, or perhaps only 30% of my age, yet you are so talented. Just go ahead and do what you want to do.

When I first became a professor, I went against many people’s advice to join institutions where I was the first computer vision professor—even though I knew that as a young academic, I should have gone somewhere with an established academic atmosphere and senior mentors. Of course, I would have appreciated having senior mentors there, but if they weren’t available, I would carve my own path and make my way without fear.

Later, I joined Google to learn about corporate culture within the tech giant, and then founded a startup at Stanford University. Around 2018, AI became a global issue for all humanity. While humans continuously drive technological progress, we must not lose our humanity. I care deeply about positive guidance in AI development and want AI to benefit humanity by being human-centric.

Thus, I returned to Stanford to establish the Institute for Human-Centered Artificial Intelligence (HAI) and ran it for five years. Some may not understand this choice, but I am very proud of it. In a way, I feel that I truly love being an entrepreneur.

I enjoy the feeling of starting from zero, standing at ground level, forgetting everything done in the past, disregarding others’ opinions, and simply focusing on hard work and building. That is my comfort zone.

Li Feifei's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Q: Another remarkable aspect of you, beyond all your impressive achievements, is that you have mentored many legendary researchers, such as Andrej Karpathy, Jim Fan from NVIDIA, and Jia Deng, who co-created ImageNet with you. They have all gone on to have extraordinary careers. What truly set them apart during their student years?

Li Feifei: First of all, I am a lucky person. I believe students mean more to me than anything else; they genuinely made me a better person, teacher, and researcher. As you said, having the honor of working with so many legendary students is truly one of the highlights of my life.

They are very different from one another. Some are pure scientists who bury themselves in solving scientific problems; others are industry leaders; and some are the greatest disseminators of AI knowledge. However, I believe there is one trait that unites them all.

I encourage every single one of them to reflect on this question, which also serves as advice I give to founders who are hiring, including my own recruitment criteria: I look for fearlessness in thought.

I believe it does not matter where you come from or what problem we are trying to solve. The courage and fearlessness to embrace difficult tasks, go all out, and find ways to resolve problems is the core trait of successful people. I learned this from them, and I am genuinely looking for individuals who possess this quality.

young people. As the CEO of World Labs, when hiring, I look for individuals who possess this trait.

Q: So you are also hiring extensively for World Labs.

Fei-Fei Li: Yes, we are recruiting engineering talent, product talent, 3D specialists, and generative model experts. So, if you consider yourself fearless and passionate about solving spatial intelligence problems, please reach out to me or visit our website.

Finding Life’s Optimal Solution via Gradient Descent

Audience Member 1: Hi, Fei-Fei, I’m your superfan. My question is: Over twenty years ago, you worked on visual recognition. If I want to start a PhD now, what direction should I choose to become a legend like you?

Fei-Fei Li: While I could say “do whatever excites you,” I’d rather give you a more thoughtful answer. First, I believe AI research has changed because academia no longer holds the majority of AI resources, which is very different from my era. Resources such as chips, compute power, and data are indeed scarce in academic settings.

As a PhD student, I suggest you seek out teams that can solve problems better without relying on superior computation or larger datasets. In academia, we can still identify fundamental issues where significant progress can be made regardless of how many chips you have.

Secondly, interdisciplinary AI is a very exciting field in academia, particularly in scientific discovery. There are too many disciplines to intersect with AI; I believe this is a promising area for theoretical advancement.

Fei-Fei Li's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Interestingly, AI capabilities have already surpassed theory: we don’t know how to do certain things, we lack interpretability, we don’t know how to identify causality, and there are too many things we don’t understand… so people can continue pushing forward.

And this list could go on: in computer vision, there are still representation problems we haven’t solved. Additionally, small data is another very interesting area; these represent possibilities.

Audience Member 2: Congratulations again on receiving the honorary doctorate from Yale University; I had the privilege of witnessing that moment there a month ago. My question is: In your view, is AGI more likely to emerge as unified models or as multi-agent systems in a single, unified form?

Fei-Fei Li: The way you phrased this question already reflects two different definitions. One definition is more theoretical: if there were an IQ test that defined AGI upon passing it. The other definition is more functional: if it is agent-based, does it possess functionality and what tasks can it perform?

Honestly, I am also confused by the definition of AGI. In 1956, AI pioneers like John McCarthy and Marvin Minsky gathered at Dartmouth with the goal of solving machine thinking. This was a question posed by Turing ten years earlier; in that statement, it wasn’t narrow AI but rather a formulation of intelligence itself.

So I am not quite sure how to distinguish between the definitions of AI and this new term, AGI. To me, they are one and the same. But I understand that today’s industry likes to refer to AGI as something beyond AI, which confuses me because I don’t know exactly what differentiates AGI from AI.

If we say that current AGI systems perform better than narrow AI systems from the 80s, 70s, or 90s, I believe this is simply a matter of progress in the field. Fundamentally, however, I think “scale is intelligence.” We aim to create machines capable of thinking and acting as intelligently as humans, or even more so.

I don’t know how to define AGI, nor do I know if it must be singular without defining it. If you view the brain as a whole, it indeed has different functions. There are specialized language areas, visual cortices, and motor cortices. So I really don’t know how to answer that question.

Audience Member 3: It is truly inspiring to see a woman playing a leading role in this field. I would like to ask: With the rapid rise of AI, as a researcher, educator, and entrepreneur, what kind of person should pursue a graduate degree?

Fei-Fei Li: That’s a great question. It’s even something parents ask me. I believe graduate school is those four to five years when you are filled with intense curiosity. You are led by that curiosity. The drive is so strong that there is no better time than this period to satisfy it.

Pursuing a graduate degree differs from entrepreneurship because you cannot lead a startup solely on curiosity; otherwise, your investors would be furious. A startup with clear business goals has curiosity as part of its motivation, but not the whole story.

For grassroots researchers, curiosity about solving problems or asking the right questions is crucial. I believe those who dive in with strong curiosity will enjoy their four to five years of graduate study. Even if the outside world develops at light speed, you will still feel happy because you are pursuing that curiosity there.

Audience Member 4: You mentioned that open source was a vital component of ImageNet’s development. Now, with the latest releases of large language models, we see organizations adopting different approaches to open source: some go fully closed-source, others release their entire research stack openly, and some fall in between by releasing weights or using restrictive licenses. So I’d like to ask: How do you view these different open-source methods, and what do you think is the correct way for an AI company to handle open sourcing?

Fei-Fei Li: I am not dogmatic about whether one must be open source or closed source. It depends on the company’s business strategy.

For example, Facebook/Meta’s reason for wanting to go open source is obvious: their current business model does not rely on selling models for profit. They are leveraging it to build an ecosystem so that people come to their platform. So, going open source makes sense for them.

Other companies make money through either open or closed sources. So I am quite open-minded about this issue. I believe open source should be protected; if both the public sector (such as academia) and the private sector have open-source initiatives, it is crucial for the startup ecosystem. I think it deserves technical protection.

Audience Member 4: I have a question about data: Since you are now working on world models, you pointed out the shift in machine learning toward data-driven methods represented by ImageNet, and you mentioned that such spatial data does not exist on the internet—it only exists in our minds. How do you solve this problem? Are you collecting this data from the real world? Is it synthetic data? Or do you rely on those ancient priors? Thank you.

Fei-Fei Li: You should join World Labs, and I’ll tell you then.

As a company, I can’t reveal too much, but I admit we are taking a hybrid approach. Having massive amounts of data is important, but having high-quality data is equally crucial. Ultimately, if you don’t pay attention to data quality, you will still get “garbage in, garbage out.”

Audience Member 5: In your book The Worlds I See, you discussed the challenges faced by immigrant girls and women in STEM. I am curious: Have you ever felt like a minority in the workplace? If so, how did you overcome this or persuade others?

Fei-Fei Li's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Fei-Fei Li: Thank you for asking this question. I want to answer very carefully and thoughtfully because we all come from different backgrounds, and everyone’s feelings are unique. Actually, it doesn’t matter who we are; we have all had moments where we felt like a minority.

Sometimes it depends on who I am, sometimes it is based on my ideas, and sometimes it is just about something trivial like the color of the shirt I’m wearing. But this is exactly what I want to encourage everyone to do: having come here from a young age, I have already tested…

Understanding the essence of this matter, I have almost cultivated an ability not to dwell on it excessively—as a female immigrant. Like each of you, I am here to learn, work, or create.

At the end of the interview, Fei-Fei Li offered her best wishes to all young people:

You are about to embark on a journey, or perhaps you are already in the midst of one. There will be moments of vulnerability, and strange things may happen. In entrepreneurship, I feel this way every day; sometimes I think, “Goodness, I have no idea what I’m doing.” But just focus on moving forward and find the optimal solution through gradient descent.

Interview link: https://www.youtube.com/watch?v=\_PioN-CpOP0

Fei-Fei Li's Latest Interview: AGI Is Incomplete Without Spatial Intelligence

Author Info

ImageNet Built the Data Skeleton for Modern Computer Vision

The Fusion of Natural Language and Visual Signals Enables Agents to Tell the Story of the World

Without Spatial Intelligence, General Intelligence Is Incomplete

Fearlessness in Thought

Finding Life’s Optimal Solution via Gradient Descent

Related News

Latest Headlines