This notion of alignment is ‘‘value-free’’. It does not require solving thorny problems in moral philosophy, like what even are values anyway?
Three sentences earlier:
By alignment, I mean us being able to get an AI to do what we want it to do, without it trying to do things basically nobody would want
So, I’m confused about this idea that we’re going to build an AI that does what you want and doesn’t do bad things that nobody would want, without needing to make any progress on thorny philosophical problems related to what counts as bad things nobody would want.
My main-line expectation is that if it seems like we have produced an AI that does all that good stuff without needing to have solved hard problems, and that AI causes large changes to the world, then bad things that nobody would want will happen despite superficial reassurances that that’s unlikely.
Three sentences earlier:
So, I’m confused about this idea that we’re going to build an AI that does what you want and doesn’t do bad things that nobody would want, without needing to make any progress on thorny philosophical problems related to what counts as bad things nobody would want.
My main-line expectation is that if it seems like we have produced an AI that does all that good stuff without needing to have solved hard problems, and that AI causes large changes to the world, then bad things that nobody would want will happen despite superficial reassurances that that’s unlikely.