First, even if we say that a given human (for example) at a fixed point in time doesn’t necessarily contain everything that we would want the AI to learn, if it only learns what’s in there, there might already be a lot of alignment failures that disappear. For example paperclip maximizers are probably ruled out by taking one human’s values at a point in time and extrapolating. But that clearly doesn’t help with scenarios where the AI does the sort of bad things humans can do, for example.
Second, I would argue that in the you of the past, there might actually be enough information to encode, if not the you of now, at least better and better versions of you through interactions with the environment. Or said another way, I feel like what we’re pointing at when we’re pointing at a human is the normativity of human values, including how they evolve, and how we think about how they evolve, and… recursively. So I think you might actually have all the information you want from this part of space if AI captures the process behind rethinking our values and ideas.
I have two reactions while reading this post:
First, even if we say that a given human (for example) at a fixed point in time doesn’t necessarily contain everything that we would want the AI to learn, if it only learns what’s in there, there might already be a lot of alignment failures that disappear. For example paperclip maximizers are probably ruled out by taking one human’s values at a point in time and extrapolating. But that clearly doesn’t help with scenarios where the AI does the sort of bad things humans can do, for example.
Second, I would argue that in the you of the past, there might actually be enough information to encode, if not the you of now, at least better and better versions of you through interactions with the environment. Or said another way, I feel like what we’re pointing at when we’re pointing at a human is the normativity of human values, including how they evolve, and how we think about how they evolve, and… recursively. So I think you might actually have all the information you want from this part of space if AI captures the process behind rethinking our values and ideas.