This still seems like a fair way to evaluate what the alignment community thinks about, but I think it is going to overestimate how parochial the community is. For example, if you go by “what does Stuart Russell think is important”, I expect you get a very different view on the field, much of which won’t be in the Alignment Newsletter.
I agree. I intended to gesture a little bit at this when I mentioned that “Until more recently, It’s also been excluded and not taken very seriously within traditional academia”, because I think one source of greater diversity has been the uptake of AI alignment in traditional academia, leading to slightly more inter-disciplinary work, as well as a greater diversity of AI approaches. I happen to think that CHAI’s research publications page reflects more of the diversity of approaches I would like to see, and wish that more new researchers were aware of them (as opposed to the advice currently given by, e.g., 80K, which is to skill up in deep learning and deep RL).
Reward functions are typically allowed to depend on actions, and the alignment community is particularly likely to use reward functions on entire trajectories, which can express arbitrary views (though I agree that many views are not “naturally” expressed in this framework).
Yup, I think purely at the level of expressivity, reward functions on a sufficiently extended state space can express basically anything you want. That still doesn’t resolve several worries I have though:
Talking about all human motivation using “rewards” tends to promote certain (behaviorist / Humean) patterns of thought over others. In particular I think it tends to obscure the logical and hierarchical structure of many aspects of human motivation—e.g., that many of our goals are actually instrumental sub-goals in higher-level plans, and that we can cite reasons for believing, wanting, or planning to do a certain thing. I would prefer if people used terms like “reasons for action” and “motivational states”, rather than simply “reward functions”.
Even if reward functions can express everything you want them to, that doesn’t mean they’ll be able to learn everything you want them to, or generalize in the appropriate ways. e.g., I think deep RL agents are unlikely to learn the concept of “promises” in a way that generalizes robustly, unless you give them some kind of inductive bias that leads them to favor structures like LTL formulas (This is a related worry to Stuart Armstrong’s no-free-lunch theorem.) At some point I intend to write a longer post about this worry.
Of course, you could just define reward functions over logical formulas and the like, and do something like reward modeling via program induction, but at that point you’re no longer using “reward” in the way its typically understood. (This is similar to move, made by some Humeans, that reason can only be motivating because we desire to follow reason. That’s fair enough, but misses the point for calling certain kinds of motivations “reasons” at all.)
(I’d cite deep learning generally, not just deep RL.)
You’re right, that’s what I meant, and have updated the post accordingly.
If you start with an uninformative prior and no other evidence, it seems like you should be focusing a lot of attention on the paradigm that is most successful / popular. So why is this influence “undue”?
I agree that if you start with a very uninformative prior, focusing on the most recently successful paradigm makes sense. But I think once you take into account slightly more information, I think there’s reason to think the AI alignment community is currently overly biased towards deep learning:
The trend-following behavior in most scientific & engineering fields, including AI, should make us skeptical that currently popular approaches are popular for the right reasons. In the 80s everyone was really excited about expert systems and the 5th generation project. About 10 years ago, Bayesian non-parametrics were really popular. Now deep learning is popular. Knowing this history suggests that we should be a little more careful about joining the bandwagon. Unfortunately, a lot of us joining the field now don’t really know this history, nor are we necessarily exposed to the richness and breadth of older approaches before diving headfirst into deep learning (I only recognized this after starting my PhD and started learning more about symbolic AI planning and programming languages research).
We have extra reason to be cautious about deep learning being popular for the wrong reasons, given that many AI researchers say that we should be focusing less on machine learning while at the same time publishing heavily in machine learning. For example, at the AAAI 2019 informal debate, the majority of audience members voted against the proposition that “The AI community today should continue to focus mostly on ML methods”. At some point during the debate, it was noted that despite the opposition to ML, most papers at AAAI that year were about ML, and it was suggested, to some laughter, that people were publishing in ML simply because that’s what would get them published.
The diversity of expert opinion about whether deep learning will get us to AGI doesn’t feel adequately reflected in the current AI alignment community. Not everyone thinks the Bitter Lesson is quite the lesson we have to learn at. A lot of of prominent researchers like Stuart Russell, Gary Marcus, and Josh Tenenbaum all think that we need to re-invigorate symbolic and Bayesian approaches (perhaps through hybrid neuro-symbolic methods), and if you watch the 2019 Turing Award keynotes by both Hinton and Bengio, both of them emphasize the importance of having structured generative models of the world (they just happen to think it can be achieved by building the right inductive biases into neural networks). In contrast, outside of MIRI, it feels like a lot of the alignment community anchors towards the work that’s coming out of OpenAI and DeepMind.
My own view is that the success of deep learning should be taken in perspective. It’s good for certain things, and certain high-data training regimes, and will remain good for those use cases. But in a lot of other use cases, where we might care a lot about sample efficiency and rapid + robust generalizability, most of the recent progress has, in my view, been made by cleverly integrating symbolic approaches with neural networks (even AlphaGo can be seen as a version of this, if one views MCTS as symbolic). I expect future AI advances to occur in a similar vein, and for me that lowers the relevance of ensuring that end-to-end DL approaches are safe and robust.
Re: worries about “reward”, I don’t feel like I have a great understanding of what your worry is, but I’d try to summarize it as “while the abstraction of reward is technically sufficiently expressive, 1) it may not have the right inductive biases, and so the framework might fail in practice, and 2) it is not a good framework for thought, because it doesn’t sufficiently emphasize many important concepts like logic and hierarchical planning”.
I think I broadly agree with those points if our plan is to explicitly learn human values, but it seems less relevant when we aren’t trying to do that and are instead trying to
provide a general method for creating AI systems that pursue some specific task, interpreted the way we meant it to be interpreted.
In this framework, “knowledge about what humans want” doesn’t come from a reward function, it comes from something like GPT-3 pretraining. The AI system can “invent” whatever concepts are best for representing its knowledge, which includes what humans want.
Here, reward functions should instead be thought of as akin to loss functions—they are ways of incentivizing particular kinds of outputs. I think it’s reasonable to think on priors that this wouldn’t be sufficient to get logical / hierarchical behavior, but I think GPT and AlphaStar and all the other recent successes should make you rethink that judgment.
----
The trend-following behavior in most scientific & engineering fields, including AI, should make us skeptical that currently popular approaches are popular for the right reasons.
I agree that trend-following behavior exists. I agree that this means that work on deep learning is less promising than you might otherwise think. That doesn’t mean it’s the wrong decision; if there are a hundred other plausible directions, it can still be the case that it’s better to bet on deep learning rather than try your hand at guessing which paradigm will become dominant next. To quote Rodney Brooks:
Whatever [the “next big thing”] turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.
He also predicts that the “next big thing” will happen by 2027 (though I get the sense that he might count new kinds of deep learning architectures as a “big thing” so he may not be predicting something as paradigm-shifting as you’re thinking).
Whether to diversify depends on the size of the field; if you have 1 million alignment researchers you definitely want to diversify, whereas at 5 researchers you almost certainly don’t; I’m claiming that we’re small enough now and uninformed enough about alternatives to deep learning that diversification is not a great approach.
We have extra reason to be cautious about deep learning being popular for the wrong reasons, given that many AI researchers say that we should be focusing less on machine learning while at the same time publishing heavily in machine learning.
Just because AI research should diversify doesn’t mean alignment research should diversify—given their relative sizes, it seems correct for alignment researchers to focus on the dominant paradigm while letting AI researchers explore the space of possible ways to build AI. Alignment researchers should then be ready to switch paradigms if a new one is found.
A lot of of prominent researchers like Stuart Russell, Gary Marcus, and Josh Tenenbaum all think that we need to re-invigorate symbolic and Bayesian approaches (perhaps through hybrid neuro-symbolic methods)
This feels like the most compelling argument, since it identifies particular other approaches (though still very large ones). Some objections from the outside view:
I think all three of the researchers you mentioned have long timelines; work is generally more useful on shorter timelines, this should bias you towards what is currently popular. Some of these researchers don’t think we can get to AGI at all; as long as you aren’t confident that they are correct, you should ignore that position (if we’re in that world, then there isn’t any AI alignment x-risk, so it isn’t decision-relevant).
I find the arguments given by these researchers to be relatively weak and easily countered, and am more inclined to use inside-view arguments as a result. (Though I should note that I think that it is often correct to trust in an expert even when their arguments seem weak, so this is a relatively minor point.)
(Re: Hinton and Bengio, I feel like that’s in support of the work that’s currently being done; the work that comes out of those labs doesn’t seem that different from what comes out of OpenAI and DeepMind.)
Going to the inside view on neurosymbolic AI:
(even AlphaGo can be seen as a version of this, if one views MCTS as symbolic)
Overall, I do expect that neurosymbolic approaches will be helpful and used in many practical AI applications; they allow you to encode relevant domain knowledge without having to learn it all from scratch. I don’t currently see that it introduces new alignment problems, or changes how we should think about the existing problems that we work on, and that’s the main reason I don’t focus on it. But I certainly agree with that as a background model of what future AI systems will look like, and if someone identified a problem that happens with neurosymbolic AI that isn’t addressed by current work in AI alignment, I’d be pretty excited to see research solving that problem, and might do it myself.
----
Things I do agree with:
It would be significantly better if the average / median commenter on the Alignment Forum knew more about AI techniques. (I think this is also true of deep learning.)
There will probably be something in the future that radically changes our beliefs about AGI.
Replying to the specific comments:
I agree. I intended to gesture a little bit at this when I mentioned that “Until more recently, It’s also been excluded and not taken very seriously within traditional academia”, because I think one source of greater diversity has been the uptake of AI alignment in traditional academia, leading to slightly more inter-disciplinary work, as well as a greater diversity of AI approaches. I happen to think that CHAI’s research publications page reflects more of the diversity of approaches I would like to see, and wish that more new researchers were aware of them (as opposed to the advice currently given by, e.g., 80K, which is to skill up in deep learning and deep RL).
Yup, I think purely at the level of expressivity, reward functions on a sufficiently extended state space can express basically anything you want. That still doesn’t resolve several worries I have though:
Talking about all human motivation using “rewards” tends to promote certain (behaviorist / Humean) patterns of thought over others. In particular I think it tends to obscure the logical and hierarchical structure of many aspects of human motivation—e.g., that many of our goals are actually instrumental sub-goals in higher-level plans, and that we can cite reasons for believing, wanting, or planning to do a certain thing. I would prefer if people used terms like “reasons for action” and “motivational states”, rather than simply “reward functions”.
Even if reward functions can express everything you want them to, that doesn’t mean they’ll be able to learn everything you want them to, or generalize in the appropriate ways. e.g., I think deep RL agents are unlikely to learn the concept of “promises” in a way that generalizes robustly, unless you give them some kind of inductive bias that leads them to favor structures like LTL formulas (This is a related worry to Stuart Armstrong’s no-free-lunch theorem.) At some point I intend to write a longer post about this worry.
Of course, you could just define reward functions over logical formulas and the like, and do something like reward modeling via program induction, but at that point you’re no longer using “reward” in the way its typically understood. (This is similar to move, made by some Humeans, that reason can only be motivating because we desire to follow reason. That’s fair enough, but misses the point for calling certain kinds of motivations “reasons” at all.)
You’re right, that’s what I meant, and have updated the post accordingly.
I agree that if you start with a very uninformative prior, focusing on the most recently successful paradigm makes sense. But I think once you take into account slightly more information, I think there’s reason to think the AI alignment community is currently overly biased towards deep learning:
The trend-following behavior in most scientific & engineering fields, including AI, should make us skeptical that currently popular approaches are popular for the right reasons. In the 80s everyone was really excited about expert systems and the 5th generation project. About 10 years ago, Bayesian non-parametrics were really popular. Now deep learning is popular. Knowing this history suggests that we should be a little more careful about joining the bandwagon. Unfortunately, a lot of us joining the field now don’t really know this history, nor are we necessarily exposed to the richness and breadth of older approaches before diving headfirst into deep learning (I only recognized this after starting my PhD and started learning more about symbolic AI planning and programming languages research).
We have extra reason to be cautious about deep learning being popular for the wrong reasons, given that many AI researchers say that we should be focusing less on machine learning while at the same time publishing heavily in machine learning. For example, at the AAAI 2019 informal debate, the majority of audience members voted against the proposition that “The AI community today should continue to focus mostly on ML methods”. At some point during the debate, it was noted that despite the opposition to ML, most papers at AAAI that year were about ML, and it was suggested, to some laughter, that people were publishing in ML simply because that’s what would get them published.
The diversity of expert opinion about whether deep learning will get us to AGI doesn’t feel adequately reflected in the current AI alignment community. Not everyone thinks the Bitter Lesson is quite the lesson we have to learn at. A lot of of prominent researchers like Stuart Russell, Gary Marcus, and Josh Tenenbaum all think that we need to re-invigorate symbolic and Bayesian approaches (perhaps through hybrid neuro-symbolic methods), and if you watch the 2019 Turing Award keynotes by both Hinton and Bengio, both of them emphasize the importance of having structured generative models of the world (they just happen to think it can be achieved by building the right inductive biases into neural networks). In contrast, outside of MIRI, it feels like a lot of the alignment community anchors towards the work that’s coming out of OpenAI and DeepMind.
My own view is that the success of deep learning should be taken in perspective. It’s good for certain things, and certain high-data training regimes, and will remain good for those use cases. But in a lot of other use cases, where we might care a lot about sample efficiency and rapid + robust generalizability, most of the recent progress has, in my view, been made by cleverly integrating symbolic approaches with neural networks (even AlphaGo can be seen as a version of this, if one views MCTS as symbolic). I expect future AI advances to occur in a similar vein, and for me that lowers the relevance of ensuring that end-to-end DL approaches are safe and robust.
Re: worries about “reward”, I don’t feel like I have a great understanding of what your worry is, but I’d try to summarize it as “while the abstraction of reward is technically sufficiently expressive, 1) it may not have the right inductive biases, and so the framework might fail in practice, and 2) it is not a good framework for thought, because it doesn’t sufficiently emphasize many important concepts like logic and hierarchical planning”.
I think I broadly agree with those points if our plan is to explicitly learn human values, but it seems less relevant when we aren’t trying to do that and are instead trying to
In this framework, “knowledge about what humans want” doesn’t come from a reward function, it comes from something like GPT-3 pretraining. The AI system can “invent” whatever concepts are best for representing its knowledge, which includes what humans want.
Here, reward functions should instead be thought of as akin to loss functions—they are ways of incentivizing particular kinds of outputs. I think it’s reasonable to think on priors that this wouldn’t be sufficient to get logical / hierarchical behavior, but I think GPT and AlphaStar and all the other recent successes should make you rethink that judgment.
----
I agree that trend-following behavior exists. I agree that this means that work on deep learning is less promising than you might otherwise think. That doesn’t mean it’s the wrong decision; if there are a hundred other plausible directions, it can still be the case that it’s better to bet on deep learning rather than try your hand at guessing which paradigm will become dominant next. To quote Rodney Brooks:
He also predicts that the “next big thing” will happen by 2027 (though I get the sense that he might count new kinds of deep learning architectures as a “big thing” so he may not be predicting something as paradigm-shifting as you’re thinking).
Whether to diversify depends on the size of the field; if you have 1 million alignment researchers you definitely want to diversify, whereas at 5 researchers you almost certainly don’t; I’m claiming that we’re small enough now and uninformed enough about alternatives to deep learning that diversification is not a great approach.
Just because AI research should diversify doesn’t mean alignment research should diversify—given their relative sizes, it seems correct for alignment researchers to focus on the dominant paradigm while letting AI researchers explore the space of possible ways to build AI. Alignment researchers should then be ready to switch paradigms if a new one is found.
This feels like the most compelling argument, since it identifies particular other approaches (though still very large ones). Some objections from the outside view:
I think all three of the researchers you mentioned have long timelines; work is generally more useful on shorter timelines, this should bias you towards what is currently popular. Some of these researchers don’t think we can get to AGI at all; as long as you aren’t confident that they are correct, you should ignore that position (if we’re in that world, then there isn’t any AI alignment x-risk, so it isn’t decision-relevant).
I find the arguments given by these researchers to be relatively weak and easily countered, and am more inclined to use inside-view arguments as a result. (Though I should note that I think that it is often correct to trust in an expert even when their arguments seem weak, so this is a relatively minor point.)
(Re: Hinton and Bengio, I feel like that’s in support of the work that’s currently being done; the work that comes out of those labs doesn’t seem that different from what comes out of OpenAI and DeepMind.)
Going to the inside view on neurosymbolic AI:
I feel like if you endorse this then you should also think of iterated amplification as neurosymbolic (though maybe you think if humans are involved that’s “neurohuman” rather than neurosymbolic and the distinction is relevant for some reason).
Overall, I do expect that neurosymbolic approaches will be helpful and used in many practical AI applications; they allow you to encode relevant domain knowledge without having to learn it all from scratch. I don’t currently see that it introduces new alignment problems, or changes how we should think about the existing problems that we work on, and that’s the main reason I don’t focus on it. But I certainly agree with that as a background model of what future AI systems will look like, and if someone identified a problem that happens with neurosymbolic AI that isn’t addressed by current work in AI alignment, I’d be pretty excited to see research solving that problem, and might do it myself.
----
Things I do agree with:
It would be significantly better if the average / median commenter on the Alignment Forum knew more about AI techniques. (I think this is also true of deep learning.)
There will probably be something in the future that radically changes our beliefs about AGI.