I found this post very useful! I went through the list and wrote down my thoughts on the points, posting them here in case they are of interest to others.
---
Some high-level comments first.
Disclaimer: I’m not senior enough to have consistent inside-views. I wrote up a similar list a few days ago in response to Yudkowsky’s post, and some of my opinions have changed.
In particular, I note that I have been biased to agree with Yudkowsky for reasons unrelated to actual validity of arguments, such as “I have read more texts by him than any other single person”.
So despite nitpicking about some points, the post was very useful, causing me to update my views on some issues towards Christiano’s.
---
I agree more or less with all of the points where Christiano agrees with Yudkwosky. Point 7 seems to put more weight on “humans let AI systems control killer robots and this is relevant” than I would.
On the disagreements:
1. I appreciate the point and agree that much of the issue here is institutional. Upon consideration I’ve updated on “traditional R&D is a useful source of information”, though I feel like this is a smaller part than Christiano. I believe this stems from me thinking “we really need theoretical foundations and guarantees when entering superhuman level”.
2. I realize that I have confused “a powerful enough AI system could build nanotechnology” with “in practice a major threat scenario is nanotechnology”. I have seen statements of the first type (though am unable to evaluate this issue myself), and less of the second type. I agree with Christiano that this is far from the likeliest scenario in practice, and rather should only be thought as an explanation for why strategies of form “keep the AI isolated from Internet/people/etc.” fail.
3. I am confused with “how impressive this looks” being used as a proxy for “how dangerous this is”. Certainly nanotechnology is both impressive and dangerous, but I am wary about making generalizations from this argument.
4. Boils down to takeoff speeds. In a weak form “AI systems will get gradually better at improving themselves” is likely true.
5. The terminology “pivotal act” is quite suggestive indeed, pointing at a cluster of solutions that is not the whole set of solutions. It is not at all obvious to me that most of the worlds where we survive arise from paths which one associates with the phrase “pivotal act”.
6. The point made seems valuable. No opinion here, have to think about this more.
7. On “we are … approaching AI systems that can meaningfully accelerate progress by generating ideas, recognizing problems for those ideas...”, my feeling is that current systems lack a good world model, this restricts alignment work, and progress on world models is progress on capabilities. I don’t see the relevance to recursive self-improvement—in the worst cases self-improvement is faster than systems with humans in the loop.
8. No opinion here.
9. No opinion here.
10. No opinion here, should read the debates.
11. “based largely on his own experiences working on the problem” is a bit unfair—there of course are arguments for why alignment is hard, and one should focus on the validity of those arguments. (This post of course does that, which is good.)
12. No opinion here, but appreciate the point.
13. No opinion here (too meta for me to have an informed view), but appreciate the point.
14. Personal experience: it was only until I read the recent post on the interpretability tech tree that I understood how interpretability could lead to actual change in existential risk.
15. I agree that taken literally point 11 in list of lethalities does not hold up—I don’t see any particularly strong reason why we couldn’t build narrow systems aimed at building nanotechnology. I understood a part of the point in 11 was that general systems can do things which are far out of distribution, and you cannot hope that they are aligned there, which seems much more defensible.
16. Good to point out that you can study deceptive behavior in weaker systems as well (not to say that new problems couldn’t appear later on).
17. A fair point.
18. No opinion here, don’t have an informed view.
19. I agree. It seems that the argument given by Yudkowsky is “a sufficiently strong AGI could write down a ‘pivotal act’ which looks good to humans but which is actually bad”, which is true, but this doesn’t imply “it is not possible to build an AI that outputs a pivotal act which is good and which humans couldn’t have thought of”. (Namely, if you can make the AI “smart-but-not-too-smart” in some sense.)
20. No opinion here.
21. Fair point.
22. No opinion here, but again appreciate the point. (The thought about “AI learns a lot from humans because of the feedback loop” in particular was interesting and new to me.)
23. Agree with the first sentence.
24. No opinion here.
25. This is a great point: it is novel (to me), relevant and seems correct. A slight update towards optimism for me.
26. I’m as well confused about what kind of plan we should have (in a way which is distinct from “we should have more progress on alignment”).
I found this post very useful! I went through the list and wrote down my thoughts on the points, posting them here in case they are of interest to others.
---
Some high-level comments first.
Disclaimer: I’m not senior enough to have consistent inside-views. I wrote up a similar list a few days ago in response to Yudkowsky’s post, and some of my opinions have changed.
In particular, I note that I have been biased to agree with Yudkowsky for reasons unrelated to actual validity of arguments, such as “I have read more texts by him than any other single person”.
So despite nitpicking about some points, the post was very useful, causing me to update my views on some issues towards Christiano’s.
---
I agree more or less with all of the points where Christiano agrees with Yudkwosky. Point 7 seems to put more weight on “humans let AI systems control killer robots and this is relevant” than I would.
On the disagreements:
1. I appreciate the point and agree that much of the issue here is institutional. Upon consideration I’ve updated on “traditional R&D is a useful source of information”, though I feel like this is a smaller part than Christiano. I believe this stems from me thinking “we really need theoretical foundations and guarantees when entering superhuman level”.
2. I realize that I have confused “a powerful enough AI system could build nanotechnology” with “in practice a major threat scenario is nanotechnology”. I have seen statements of the first type (though am unable to evaluate this issue myself), and less of the second type. I agree with Christiano that this is far from the likeliest scenario in practice, and rather should only be thought as an explanation for why strategies of form “keep the AI isolated from Internet/people/etc.” fail.
3. I am confused with “how impressive this looks” being used as a proxy for “how dangerous this is”. Certainly nanotechnology is both impressive and dangerous, but I am wary about making generalizations from this argument.
4. Boils down to takeoff speeds. In a weak form “AI systems will get gradually better at improving themselves” is likely true.
5. The terminology “pivotal act” is quite suggestive indeed, pointing at a cluster of solutions that is not the whole set of solutions. It is not at all obvious to me that most of the worlds where we survive arise from paths which one associates with the phrase “pivotal act”.
6. The point made seems valuable. No opinion here, have to think about this more.
7. On “we are … approaching AI systems that can meaningfully accelerate progress by generating ideas, recognizing problems for those ideas...”, my feeling is that current systems lack a good world model, this restricts alignment work, and progress on world models is progress on capabilities. I don’t see the relevance to recursive self-improvement—in the worst cases self-improvement is faster than systems with humans in the loop.
8. No opinion here.
9. No opinion here.
10. No opinion here, should read the debates.
11. “based largely on his own experiences working on the problem” is a bit unfair—there of course are arguments for why alignment is hard, and one should focus on the validity of those arguments. (This post of course does that, which is good.)
12. No opinion here, but appreciate the point.
13. No opinion here (too meta for me to have an informed view), but appreciate the point.
14. Personal experience: it was only until I read the recent post on the interpretability tech tree that I understood how interpretability could lead to actual change in existential risk.
15. I agree that taken literally point 11 in list of lethalities does not hold up—I don’t see any particularly strong reason why we couldn’t build narrow systems aimed at building nanotechnology. I understood a part of the point in 11 was that general systems can do things which are far out of distribution, and you cannot hope that they are aligned there, which seems much more defensible.
16. Good to point out that you can study deceptive behavior in weaker systems as well (not to say that new problems couldn’t appear later on).
17. A fair point.
18. No opinion here, don’t have an informed view.
19. I agree. It seems that the argument given by Yudkowsky is “a sufficiently strong AGI could write down a ‘pivotal act’ which looks good to humans but which is actually bad”, which is true, but this doesn’t imply “it is not possible to build an AI that outputs a pivotal act which is good and which humans couldn’t have thought of”. (Namely, if you can make the AI “smart-but-not-too-smart” in some sense.)
20. No opinion here.
21. Fair point.
22. No opinion here, but again appreciate the point. (The thought about “AI learns a lot from humans because of the feedback loop” in particular was interesting and new to me.)
23. Agree with the first sentence.
24. No opinion here.
25. This is a great point: it is novel (to me), relevant and seems correct. A slight update towards optimism for me.
26. I’m as well confused about what kind of plan we should have (in a way which is distinct from “we should have more progress on alignment”).
27. No opinion here.