I’ll very quickly remark that I think that the competence gap is indeed the main issue. If we imagine an AI built to a level where it was as smart as all the mathematicians who could work on the problem in advance, but able to do the same work faster, which didn’t use any self-improvement along the way, and it was otherwise within a Friendliness framework that well-decided its preferences among what decision framework would control whatever stability framework it invented, then clearly there’s no advantage to trying to do the work in advance. But I think the competence gap is much larger than that zero level.
Note that we care about the gap between {Ability to design powerful AI} and {Ability to design powerful AI that will do what the original AI wants}. I think the main difference is that you see the second one as a super-hard problem. I don’t see it as a super-hard problem, especially if we have already successfully built one AI that does what we want. I tried to flesh out this disagreement in the post.
I do see a gap as plausible, since I expect capabilities to be uneven and who knows what will come first.
But it would be surprising if an AI was good at figuring out what other AI’s would be effective, but wasn’t able to understand that itself was effective—since presumably these other AI’s would be quite similar to itself, and would be leveraging the same insights. The concern seems to be the case where the AI understands why it is able to do so much cool stuff, but is not able to understand why it is motivated to do the right cool stuff (and can’t figure it out, despite the motivation to do so and the availability of human explainers who do understand).
To me this scenario seems unlikely. I assume you have a different picture than I do.
I think the main disagreement is about whether it’s possible to get an initial system which is powerful in the ways needed for your proposal and which is knowably aligned with our goals; some more about this in my reply to your post, which I’ve finally posted, though there I mostly discuss my own position rather than Eliezer’s.
I’ll very quickly remark that I think that the competence gap is indeed the main issue. If we imagine an AI built to a level where it was as smart as all the mathematicians who could work on the problem in advance, but able to do the same work faster, which didn’t use any self-improvement along the way, and it was otherwise within a Friendliness framework that well-decided its preferences among what decision framework would control whatever stability framework it invented, then clearly there’s no advantage to trying to do the work in advance. But I think the competence gap is much larger than that zero level.
Note that we care about the gap between {Ability to design powerful AI} and {Ability to design powerful AI that will do what the original AI wants}. I think the main difference is that you see the second one as a super-hard problem. I don’t see it as a super-hard problem, especially if we have already successfully built one AI that does what we want. I tried to flesh out this disagreement in the post.
I do see a gap as plausible, since I expect capabilities to be uneven and who knows what will come first.
But it would be surprising if an AI was good at figuring out what other AI’s would be effective, but wasn’t able to understand that itself was effective—since presumably these other AI’s would be quite similar to itself, and would be leveraging the same insights. The concern seems to be the case where the AI understands why it is able to do so much cool stuff, but is not able to understand why it is motivated to do the right cool stuff (and can’t figure it out, despite the motivation to do so and the availability of human explainers who do understand).
To me this scenario seems unlikely. I assume you have a different picture than I do.
I think the main disagreement is about whether it’s possible to get an initial system which is powerful in the ways needed for your proposal and which is knowably aligned with our goals; some more about this in my reply to your post, which I’ve finally posted, though there I mostly discuss my own position rather than Eliezer’s.