I’m roughly in agreement, though I would caution that the exploration/exploitation model is a problematic one to use in this context, for two reasons:
1) It implies a relatively clear map/territory split: there are our real values, and our conscious model of them, and errors in our conscious model do not influence the actual values. But to some extent, our conscious models of our values do shape our unconscious values in that direction—if someone switches to an exploitation phase “too early”, then over time, their values may actually shift over to what the person thought they were.
2) Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it. In other words, if we realize that our conscious values don’t match our unconscious ones, we should revise our conscious values. And sometimes this does happen. But on other occasions, what happens is that our conscious model has become installed as a separate and contradictory set of values, and we need to choose which of the values to endorse (in which situations). This happening is a bad thing if you tend to primarily endorse your unconscious values or a lack of internal conflict, but arguably a good thing if you tend to primarily endorse your conscious values.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them, and we probably shouldn’t use terminology like exploration/exploitation that implies that it would be just one of those.
But to some extent, our conscious models of our values do shape our unconscious values in that direction
This is value drift. At any given time, you should fix (i.e. notice, as a concept) the implicit idealized values at that time and pursue them even if your hardware later changes and starts implying different values (in the sense where your dog or your computer or an alien also should (normatively) pursue them forever, they are just (descriptively) unlikely to, but you should plot to make that more likely, all else equal). As an analogy, if you are interested in solving different puzzles on different days, then the fact that you are no longer interested in solving yesterday’s puzzle doesn’t address the problem of solving yesterday’s puzzle. And idealized values don’t describe valuation of you, the abstract personal identity, of your actions and behavior and desires. They describe valuation of the whole world, including future you with value drift as a particular case that is not fundamentally special. The problem doesn’t change, even if the tendency to be interested in a particular problem does. The problem doesn’t get solved because you are no longer interested in it. Solving a new, different problem does not address the original problem.
Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it
The nature of idealized values is irrelevant to this point: whatever they are, they are that thing that they are, so that any “correction” discards the original problem statement and replaces it with a new one. What you can and should correct are intermediate conclusions. (Alternatively, we are arguing about definitions, and you read in my use of the term “values” what I would call intermediate conclusions, but then again I’m interested in you noticing the particular idea that I refer to with this term.)
if we realize that our conscious values don’t match our unconscious ones
I don’t think “unconscious values” is a good proxy for abstract implicit valuation of the universe, consciously-inaccessible processes in the brain are at a vastly different level of abstraction compared to the idealization I’m talking about.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them
This might be true in the sense that humans probably underdetermine the valuation of the world, so that there are some situations that our implicit preferences can’t compare even in principle. The choice between such situations is arbitrary according to our values. Or our values might just recursively determine the correct choice in every single definable distinction. Any other kind of “creation” will contradict the implicit answer, and so even if it is the correct thing to do given the information available at the time, later reflection can show it to be suboptimal.
(More constructively, the proper place for creativity is in solving problems, not in choosing a supergoal. The intuition is confused on this point, because humans never saw a supergoal, all sane goals that we formulate for ourselves are in one way or another motivated by other considerations, they are themselves solutions to different problems. Thus, creativity is helpful in solving those different problems in order to recognize which new goals are motivated. But this is experience about subgoals, not idealized supergoals.)
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want. I do not understand why the concept would be in relevant to our personal lives, however.
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want.
The question of what is normatively the right thing to do (given the resources available) is the same for a FAI and in our personal lives. My understanding is that “implicit idealized value” is the shape of the correct answer to it, not just a tool restricted to the context of FAI. It might be hard for a human to proceed from this concept to concrete decisions, but this is a practical difficulty, not a restriction on the scope of applicability of the idea. (And to see how much of a practical difficulty it is, it is necessary to actually attempt to resolve it.)
I do not understand why the concept would be in relevant to our personal lives, however.
If idealized value indicates the correct shape of normativity, the question should instead be, How are our personal lives relevant to idealized value? One way was discussed a couple of steps above in this conversation: exploitation/exploration tradeoff. In pursuit of idealized values, if in our personal lives we can’t get much information about them, a salient action is to perform/support research into idealized values (or relevant subproblems, such as preventing/evading global catastrophes).
I’m roughly in agreement, though I would caution that the exploration/exploitation model is a problematic one to use in this context, for two reasons:
1) It implies a relatively clear map/territory split: there are our real values, and our conscious model of them, and errors in our conscious model do not influence the actual values. But to some extent, our conscious models of our values do shape our unconscious values in that direction—if someone switches to an exploitation phase “too early”, then over time, their values may actually shift over to what the person thought they were.
2) Exploration/exploitation also assumes that our true values correspond to something akin to an external reward function: if our model is mistaken, then the objectively correct thing to do would be to correct it. In other words, if we realize that our conscious values don’t match our unconscious ones, we should revise our conscious values. And sometimes this does happen. But on other occasions, what happens is that our conscious model has become installed as a separate and contradictory set of values, and we need to choose which of the values to endorse (in which situations). This happening is a bad thing if you tend to primarily endorse your unconscious values or a lack of internal conflict, but arguably a good thing if you tend to primarily endorse your conscious values.
The process of arriving at our ultimate values seems to be both an act of discovering them and an act of creating them, and we probably shouldn’t use terminology like exploration/exploitation that implies that it would be just one of those.
This is value drift. At any given time, you should fix (i.e. notice, as a concept) the implicit idealized values at that time and pursue them even if your hardware later changes and starts implying different values (in the sense where your dog or your computer or an alien also should (normatively) pursue them forever, they are just (descriptively) unlikely to, but you should plot to make that more likely, all else equal). As an analogy, if you are interested in solving different puzzles on different days, then the fact that you are no longer interested in solving yesterday’s puzzle doesn’t address the problem of solving yesterday’s puzzle. And idealized values don’t describe valuation of you, the abstract personal identity, of your actions and behavior and desires. They describe valuation of the whole world, including future you with value drift as a particular case that is not fundamentally special. The problem doesn’t change, even if the tendency to be interested in a particular problem does. The problem doesn’t get solved because you are no longer interested in it. Solving a new, different problem does not address the original problem.
The nature of idealized values is irrelevant to this point: whatever they are, they are that thing that they are, so that any “correction” discards the original problem statement and replaces it with a new one. What you can and should correct are intermediate conclusions. (Alternatively, we are arguing about definitions, and you read in my use of the term “values” what I would call intermediate conclusions, but then again I’m interested in you noticing the particular idea that I refer to with this term.)
I don’t think “unconscious values” is a good proxy for abstract implicit valuation of the universe, consciously-inaccessible processes in the brain are at a vastly different level of abstraction compared to the idealization I’m talking about.
This might be true in the sense that humans probably underdetermine the valuation of the world, so that there are some situations that our implicit preferences can’t compare even in principle. The choice between such situations is arbitrary according to our values. Or our values might just recursively determine the correct choice in every single definable distinction. Any other kind of “creation” will contradict the implicit answer, and so even if it is the correct thing to do given the information available at the time, later reflection can show it to be suboptimal.
(More constructively, the proper place for creativity is in solving problems, not in choosing a supergoal. The intuition is confused on this point, because humans never saw a supergoal, all sane goals that we formulate for ourselves are in one way or another motivated by other considerations, they are themselves solutions to different problems. Thus, creativity is helpful in solving those different problems in order to recognize which new goals are motivated. But this is experience about subgoals, not idealized supergoals.)
I think that the concept of idealized value is obviously important in an FAI context, since we need some way of formalizing “what we want” in order to have any way of ensuring that an AI will further the things we want. I do not understand why the concept would be in relevant to our personal lives, however.
The question of what is normatively the right thing to do (given the resources available) is the same for a FAI and in our personal lives. My understanding is that “implicit idealized value” is the shape of the correct answer to it, not just a tool restricted to the context of FAI. It might be hard for a human to proceed from this concept to concrete decisions, but this is a practical difficulty, not a restriction on the scope of applicability of the idea. (And to see how much of a practical difficulty it is, it is necessary to actually attempt to resolve it.)
If idealized value indicates the correct shape of normativity, the question should instead be, How are our personal lives relevant to idealized value? One way was discussed a couple of steps above in this conversation: exploitation/exploration tradeoff. In pursuit of idealized values, if in our personal lives we can’t get much information about them, a salient action is to perform/support research into idealized values (or relevant subproblems, such as preventing/evading global catastrophes).
What does this mean? It sounds like you’re talking about some kind of objective morality?