I don’t think a person can be described very precisely as having values, you need to do some work to get out something value-shaped. The easiest way is to combine a person with a deliberative process, and then make some assumption about the reflective equilibrium (e.g. that it’s rational). You will get different values depending on the choice of deliberative process, e.g. if I deliberate by writing I will generally get somewhat different values than if I deliberate by talking to myself. This path-dependence is starkest at the beginning and I expect it to decay towards 0. I don’t think that the difference between various forms of deliberation is likely to be too important, though prima facie it certainly could be.
Similarly for a government, there are lots of extrapolation procedures you can use and they will generally result in different values. I think we should be skeptical of forms of value learning that look like they make sense for people but not for groups of people. (That said, groups of people seem likely to have more path-dependence, so e.g. the choice of deliberative process may be more important for groups than individuals, and more generally individuals and groups can differ in degree if not in kind.)
On this perspective, (a) a human or government is not yet the kind of thing you can be aligned with, in my definition this was hidden in the word “wants,” which was maybe bad form but I was OK with because most people who think about this topic already appreciate the complexity of “wants,” (b) a human is unlikely to be aligned with anything, in the same sense that a pair of people with different values aren’t aligned with anything until they are sufficiently well-coordinated.
I don’t think that you would need to describe agency in order to build a corrigible AI. As an analogy: if you want to build an object that will be pushed in the direction the wind, you don’t need to give the object a definition of “wind,” and you don’t even need to have a complete definition of wind yourself. It’s sufficient for the person designing/analyzing the object to know enough facts about the wind that they can design/analyze sails.
I don’t think a person can be described very precisely as having values, you need to do some work to get out something value-shaped. The easiest way is to combine a person with a deliberative process, and then make some assumption about the reflective equilibrium (e.g. that it’s rational). You will get different values depending on the choice of deliberative process, e.g. if I deliberate by writing I will generally get somewhat different values than if I deliberate by talking to myself. This path-dependence is starkest at the beginning and I expect it to decay towards 0. I don’t think that the difference between various forms of deliberation is likely to be too important, though prima facie it certainly could be.
Similarly for a government, there are lots of extrapolation procedures you can use and they will generally result in different values. I think we should be skeptical of forms of value learning that look like they make sense for people but not for groups of people. (That said, groups of people seem likely to have more path-dependence, so e.g. the choice of deliberative process may be more important for groups than individuals, and more generally individuals and groups can differ in degree if not in kind.)
On this perspective, (a) a human or government is not yet the kind of thing you can be aligned with, in my definition this was hidden in the word “wants,” which was maybe bad form but I was OK with because most people who think about this topic already appreciate the complexity of “wants,” (b) a human is unlikely to be aligned with anything, in the same sense that a pair of people with different values aren’t aligned with anything until they are sufficiently well-coordinated.
I don’t think that you would need to describe agency in order to build a corrigible AI. As an analogy: if you want to build an object that will be pushed in the direction the wind, you don’t need to give the object a definition of “wind,” and you don’t even need to have a complete definition of wind yourself. It’s sufficient for the person designing/analyzing the object to know enough facts about the wind that they can design/analyze sails.