I’m not sure that this implies that alignment is hard—if you’re trying to prove that your system is aligned by looking at the details of how it is constructed and showing that it all works together, then yes, alignment is harder than it would be otherwise. But you could imagine other versions of alignment, eg. taking intelligence as a black box and pointing it in the right direction. (For example, if I magically knew the true human utility function, and I put that in the black box, the outcomes would probably be good.)
Here when I say “aligned” I mean “trying to help”. It’s still possible that the AI is incompetent and fails because it doesn’t understand what the consequences of its actions are.
Cool, I think I mostly agree with you.
I’m not sure that this implies that alignment is hard—if you’re trying to prove that your system is aligned by looking at the details of how it is constructed and showing that it all works together, then yes, alignment is harder than it would be otherwise. But you could imagine other versions of alignment, eg. taking intelligence as a black box and pointing it in the right direction. (For example, if I magically knew the true human utility function, and I put that in the black box, the outcomes would probably be good.)
Here when I say “aligned” I mean “trying to help”. It’s still possible that the AI is incompetent and fails because it doesn’t understand what the consequences of its actions are.