I agree that highly agentic versions of the system will complete the tasks better. My claim is just that they’re not necessary to complete the task very well, and so we shouldn’t be confident that selection for completing that task very well will end up producing the highly agentic versions.
The part where alignment is hard is precisely when the thing I’m trying to accomplish is hard. Because then I need a powerful plan, and it’s hard to specify a search for powerful plans that don’t kill everyone.
I now read you as pointing to chess as:
It is “hard to accomplish” from the perspective of human cognition.
It does not require a “powerful”/”agentic” plan.
It’s “easy” to specify a search for a good plan, we already did it.
Yepp. And clearly alignment is much harder than chess, but it seems like an open question whether it’s harder than “kill everyone” (and even if it is, there’s an open question of how much of an advantage we get from doing our best to point the system at the former not the latter).
“Kill everyone” seems like it should be “easy”, because there are so many ways to do it: humans only survive in environments with a specific range of temperatures, pressures, atmospheric contents, availability of human-digestible food, &c.
I agree that highly agentic versions of the system will complete the tasks better. My claim is just that they’re not necessary to complete the task very well, and so we shouldn’t be confident that selection for completing that task very well will end up producing the highly agentic versions.
That helps, thanks. Raemon says:
I now read you as pointing to chess as:
It is “hard to accomplish” from the perspective of human cognition.
It does not require a “powerful”/”agentic” plan.
It’s “easy” to specify a search for a good plan, we already did it.
So maybe alignment is like that.
Yepp. And clearly alignment is much harder than chess, but it seems like an open question whether it’s harder than “kill everyone” (and even if it is, there’s an open question of how much of an advantage we get from doing our best to point the system at the former not the latter).
“Kill everyone” seems like it should be “easy”, because there are so many ways to do it: humans only survive in environments with a specific range of temperatures, pressures, atmospheric contents, availability of human-digestible food, &c.