I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven’t read it since. (This isn’t a compliment—I read it multiple times because I had a lot of trouble understanding it.)
I’ve put in two points of my own in the post. First:
(My opinion: I think this isn’t engaging with the worry with RL agents—typically, we’re worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)
I agree even more with this two years later. There is an important point that CAIS makes, which is that learning is separate from competence. Nonetheless, just because an AI system must first learn about a domain before it can act in it, does not mean that we will notice it doing so. The AI does not learn to take over the world by trying to take over the world and failing, it learns by making a plan to take over the world, learning about the relevant domains (e.g. if it wants to engineer a pandemic, it learns about genetics by reading textbooks), until it is confident that its plan will succeed. This can be true even if the AI was trained using PPO.
(To connect with current discourse, this is basically saying “this doesn’t engage with mesa optimization”)
Second:
(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don’t buy this—while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)
I still agree with this, though I’d phrase it differently now. Now I would say that there is some level of data, model capacity, and compute at which an end-to-end / monolithic approach outperforms a structured approach on the training distribution (this is related to but not the same as the bitter lesson). However, at low levels of these three, the structured approach will typically perform better. The required levels at which the end-to-end approach works better depends on the particular task, and increases with task difficulty.
Since we expect all three of these factors to grow over time, I then expect that there will be an expanding Pareto frontier where at any given point the most complex tasks are performed by structured approaches, but as time progresses these are replaced by end-to-end / monolithic systems (but at the same time new, even more complex tasks are found, that can be done in a structured way).
(Really I expect this will be true up till human-level AI and a little past that, and after that who knows what happens.)
----
On CAIS itself:
I think the “monolithic AGI” that Eric critiques is a bit of a strawman, but nonetheless it is important to argue against it.
I really like the learning vs. competence distinction, and use it frequently.
I think it is often hard to tell what exactly is being argued in CAIS, and have found it difficult to understand as a result.
If I reread it now, I suspect there are many framings I would disagree with.
There are other people with similar perspectives, e.g. Michael Jordan, though they don’t engage with AI safety arguments as much.
CAIS is in my top 20 things produced in the field of AI alignment.
I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven’t read it since. (This isn’t a compliment—I read it multiple times because I had a lot of trouble understanding it.)
I’ve put in two points of my own in the post. First:
I agree even more with this two years later. There is an important point that CAIS makes, which is that learning is separate from competence. Nonetheless, just because an AI system must first learn about a domain before it can act in it, does not mean that we will notice it doing so. The AI does not learn to take over the world by trying to take over the world and failing, it learns by making a plan to take over the world, learning about the relevant domains (e.g. if it wants to engineer a pandemic, it learns about genetics by reading textbooks), until it is confident that its plan will succeed. This can be true even if the AI was trained using PPO.
(To connect with current discourse, this is basically saying “this doesn’t engage with mesa optimization”)
Second:
I still agree with this, though I’d phrase it differently now. Now I would say that there is some level of data, model capacity, and compute at which an end-to-end / monolithic approach outperforms a structured approach on the training distribution (this is related to but not the same as the bitter lesson). However, at low levels of these three, the structured approach will typically perform better. The required levels at which the end-to-end approach works better depends on the particular task, and increases with task difficulty.
Since we expect all three of these factors to grow over time, I then expect that there will be an expanding Pareto frontier where at any given point the most complex tasks are performed by structured approaches, but as time progresses these are replaced by end-to-end / monolithic systems (but at the same time new, even more complex tasks are found, that can be done in a structured way).
(Really I expect this will be true up till human-level AI and a little past that, and after that who knows what happens.)
----
On CAIS itself:
I think the “monolithic AGI” that Eric critiques is a bit of a strawman, but nonetheless it is important to argue against it.
I really like the learning vs. competence distinction, and use it frequently.
I think it is often hard to tell what exactly is being argued in CAIS, and have found it difficult to understand as a result.
If I reread it now, I suspect there are many framings I would disagree with.
There are other people with similar perspectives, e.g. Michael Jordan, though they don’t engage with AI safety arguments as much.
CAIS is in my top 20 things produced in the field of AI alignment.