I agree it was a pretty weak point. I wonder if there is a longer form exploration of this topic from Eliezer or somebody else.
I think it is even contradictory. Eliezer says that AI alignment is solvable by humans and that verification is easier than the solution. But then he claims that humans wouldn’t even be able to verify answers.
I think a charitable interpretation could be “it is not going to be as usable as you think”. But perhaps I misunderstand something?
Humans, presumably, wont have to deal with deception between themselves so if there is sufficient time they can solve Alignment. If pressed for time (as it is now) then they will have to implement less understood solutions because thats the best they will have at the time.
I agree it was a pretty weak point. I wonder if there is a longer form exploration of this topic from Eliezer or somebody else.
I think it is even contradictory. Eliezer says that AI alignment is solvable by humans and that verification is easier than the solution. But then he claims that humans wouldn’t even be able to verify answers.
I think a charitable interpretation could be “it is not going to be as usable as you think”. But perhaps I misunderstand something?
Humans, presumably, wont have to deal with deception between themselves so if there is sufficient time they can solve Alignment. If pressed for time (as it is now) then they will have to implement less understood solutions because thats the best they will have at the time.