Abstract
How could machines learn as efficiently as humans and animals? How could machines learn to reason and plan? How could machines learn representations of percepts and action plans at multiple levels of abstraction, enabling them to reason, predict, and plan at multiple time horizons? This position paper proposes an architecture and training paradigms with which to construct autonomous intelligent agents. It combines concepts such as configurable predictive world model, behavior driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.
Meta’s Chief AI Scientist Yann Lecun lays out his vision for what an architecture for generally intelligent agents might look like.
For more discussion on this paper, see the comment thread for it on OpenReview: https://openreview.net/forum?id=BZ5a1r-kVsf
I just realized today that there are comments on that page. There are about a dozen so far.
(Note: I jotted down these thoughts from my phone and while on a plane as I finished reading LeCun’s paper. They are rough and underdeveloped, but still are points I find interesting and that I think might spark some good discussion.)
A modular architecture like this would have interpretability benefits and alignment implications. The separate “hard-wired” cost module is very significant. If this were successfully built it could effectively leapfrog us into best-case interpretability scenario 2.
How will these costs be specified? That looks like a big open question from this paper. It seems like the world model would need to be trained first and interpretable so that cost module can make use of its abstractions.
Is it possible to mix hardwired components with trained models? “Risks from Learned Optimization” mentioned hardcoded optimizers as a possibility for preventing emergence of mesa optimizers. “Tool using AI” research may be relevant too, where here interestingly the cost module would be the tool.
The agent is an optimizer by design!
If a solution is developed for specifying the cost module, then the agent may be inner aligned.
But many naive ways of specifying the cost module (e.g. make human pain a cost) seem to lead straight to catastrophe via outer alignment failures and instrumental convergence for a sufficiently advanced system.
Could this architecture be leveraged to implement a cost module that’s more likely to be outer aligned like imitative HCH or some other myopic objective?
Are the trainable modules (critic, world model, actor) subject to mesa-optimization risk?
I’m quite surprised by the lack of discussion on this paper. This is probably one of the most significant paper on AGI I’ve seen as it outlines a concrete, practical path to its implementation by one of the most important researcher in the field.
There is not a lot of discussion about the paper here on LessWrong yet, but there are a dozen or so comments about it on OpenReview: https://openreview.net/forum?id=BZ5a1r-kVsf
I agree. I shared the post on an a couple AI safety Slack and Discord channels just now to try and get it more visibility.
It probably would have gotten more engagement if someone else (e.g. Gwern posted it). I’m a low karma/unpopular account, so few of my posts get seen unless people go looking for new posts.
I rarely read new posts (I read particular posts on things I’m interested in and have alerts for some posters). So, it’s not that surprising, I guess? I wouldn’t have read this post myself had someone analogous to me posted it.
Maybe there’s a way to increase discoverability of posts from low karma/unpopular users?
I see your post on the frontpage when I scroll down deep enough in Recent Discussion. We just have to write comments here often to keep it here!
Bumping is back!
Currently, there isn’t much modeling the video data on imagenet. Static image models have been far more advanced (over years of supervised learning, like from captcha) than what we have right now for video classification.
I imagine a large part of the world model described in the paper will come from video classifications.
If you look some examples of video classification problems, they are labeling entity along with an action that’s associated with the addition temporal data associated with videos. I’m not sure if this additional temporal information directly from the model will be used in this paper’s world model, or maybe this world model will most likely use classification models from static images and will store action information separate from the entities that are classified in the videos.
Of course, this example alone just shows how much work there needs to be done to even begin the preliminary phase of developing the models described in this paper, but I think the autonomous aspects presented in the paper can be independently investigated before the maturity of video data classification models.
The autonomous parts seem to be largely based on human brain emulation, at least on the architectural level. The current state of the art ML autonomy has been unsupervised learning from deep neural nets like DeepMind. That’s like using single/ensemble model for one set of tasks. This paper is more like using ensemble learning for different types of tasks that model different parts of the human brain, and their overall interactions resembles the autonomous aspect and the context such autonomy operates in.
There’s a recent project from Google AI called Pathways (blog post, paper) which also aspires to produce more general AI. From the blog post:
(Thanks to Michael Chen for making me aware of this.)
My model of Eliezer winces when a proposal for AGI design is published rather than kept secret. Part of me does too.
One upshot though is it gives AI safety researchers and proponents a more tangible case to examine. Architecture-specific risks can be identified, and central concerns like inner alignment can be evaluated against the proposed architecture and (assuming they still apply) be made more concrete and convincing.
I’m still reading the LeCun paper (currently on page 9). One thing it’s reminding me of so far is Steve Byrnes’ writing on brain-like AGI (and related safety considerations): https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8
My impression is that he’s trying to do GOFAI with fully differentiable neural networks. I’m also not sure he’s describing a GAI — I think he’s starting by aiming for parity with the capabilities of a typical mammal, not human-level, and that’s why he uses self-driving cars as an example.
Personally I think a move towards GOFAI-like ideas is a good intuition, but that insisting on keeping things fully differentiable is too constraining. I believe that at some level, we are going to need to move away from doing everything with gradient descent, and use something more like approximate Bayesianism, or at least RL.
I also think he’s underestimating the influence of genetics in mammalian mental capabilities. He talks about the step of babies learning that the world is 3D not 2D — I think it’s very plausible that adaptations for processing sensory data from a 3D rather than 2D world are already encoded in our genome, brain structure, and physiology in a many places.
If this is going to be a GAI architecture, then I think he’s massively underthinking alignment.
great to see. as important as safety research is, if we don’t get capabilities in time, most of humanity is going to be lost. long-termism requires aiming to preserve today’s archeology, or the long-term future we hoped to preserve will be lost anyway. safety is also critical; differential acceleration of safe capabilities is important, so let’s use this to try to contribute to capable safety.
I just wish lecun saw that facebook is catastrophically misaligned.