I don’t think you need value lock in to get the desirable properties you want here. Avoiding tiling through complexity/exploration gets you most of the same stuff.
The meta-values thing gets at the same thing that HRH is getting at. Also, I feel like fundamentally wireheading is a problem of embeddedness, and has a completely different causal story to the problem of reflective processes changing our values to be “zombified”, though they feel vaguely similar. The way I would look at this is if you are a non-embedded algorithm running in an embedded world, you are potentially susceptible to wireheading, and only if you are an embedded algorithm then you could possible have a preference that implies wanting zombification, or preferences guided by meta-values that avoid this, etc.
I do not bite this bullet. I am wise enough to know that I am not wise enough to be Eternal Dictator of the Future Light-Cone. My preferences about the entire future light-cone are several levels of meta removed from trivialities like suffering and satisfaction: there should never be a singleton, regardless of its values. That’s about it.
“fundamentally, we want the future to satisfy the values of us now”
What if we don’t want this after the fact? Is it not the case that our values have changed pretty radically in the last 100 years, much less the last 800-1000? If we do create some kind of value-enforcing smatter/”sovereign”/whathave you, is that not a kind of horror, the social ethic being frozen in time?
I hope for a future that vastly exceeds the conceptual & ethical values of the time we find ourselves in.
it is true that past society has failed to align current society; we’re glad because we like our values, they’d be upset because they prefer theirs. we are ourselves now and so we want to align the future to our own values.
there’s also the matter that people in the past might agree with our values more under reflection, the same way we’d probly not want the meat industry under reflection.
we should want many of our non-terminal values to change over time, of course! we just want to make sure our terminal values are in charge of how that actually happens. there is a sufficiently high meta level at which we do want our values forever, rather than some other way-our-instrumental-values-could-change which we wouldn’t like as much.
As a thought experiment, imagine that human values change cyclically. For 1000 years we value freedom and human well-being, for 1000 years we value slavery and the joy of hurting other people, and again, and again, forever… that is, unless we create a superhuman AI who can enforce a specific set of values.
Would you want the AI to promote the values of freedom and well-being, or the slavery and hurting, or something balanced in the middle, or to keep changing its mind in a cycle that mirrors the natural cycle of human values?
(It is easy to talk about “enforcing values other than our own” in abstract, but it becomes less pleasant when you actually imagine some specific values other than your own.)
Is it not the case that our values have changed pretty radically in the last 100 years, much less the last 800-1000?
To be a little pedantic, the people alive 100 years ago and those alive today are for practical purposes disjoint. No individual’s values need have changed much. The old die, the young replace them, and the middle-aged blow with the wind between.
I don’t think you need value lock in to get the desirable properties you want here. Avoiding tiling through complexity/exploration gets you most of the same stuff.
The meta-values thing gets at the same thing that HRH is getting at. Also, I feel like fundamentally wireheading is a problem of embeddedness, and has a completely different causal story to the problem of reflective processes changing our values to be “zombified”, though they feel vaguely similar. The way I would look at this is if you are a non-embedded algorithm running in an embedded world, you are potentially susceptible to wireheading, and only if you are an embedded algorithm then you could possible have a preference that implies wanting zombification, or preferences guided by meta-values that avoid this, etc.
I do not bite this bullet. I am wise enough to know that I am not wise enough to be Eternal Dictator of the Future Light-Cone. My preferences about the entire future light-cone are several levels of meta removed from trivialities like suffering and satisfaction: there should never be a singleton, regardless of its values. That’s about it.
“fundamentally, we want the future to satisfy the values of us now”
What if we don’t want this after the fact? Is it not the case that our values have changed pretty radically in the last 100 years, much less the last 800-1000? If we do create some kind of value-enforcing smatter/”sovereign”/whathave you, is that not a kind of horror, the social ethic being frozen in time?
I hope for a future that vastly exceeds the conceptual & ethical values of the time we find ourselves in.
it is true that past society has failed to align current society; we’re glad because we like our values, they’d be upset because they prefer theirs. we are ourselves now and so we want to align the future to our own values.
there’s also the matter that people in the past might agree with our values more under reflection, the same way we’d probly not want the meat industry under reflection.
we should want many of our non-terminal values to change over time, of course! we just want to make sure our terminal values are in charge of how that actually happens. there is a sufficiently high meta level at which we do want our values forever, rather than some other way-our-instrumental-values-could-change which we wouldn’t like as much.
As a thought experiment, imagine that human values change cyclically. For 1000 years we value freedom and human well-being, for 1000 years we value slavery and the joy of hurting other people, and again, and again, forever… that is, unless we create a superhuman AI who can enforce a specific set of values.
Would you want the AI to promote the values of freedom and well-being, or the slavery and hurting, or something balanced in the middle, or to keep changing its mind in a cycle that mirrors the natural cycle of human values?
(It is easy to talk about “enforcing values other than our own” in abstract, but it becomes less pleasant when you actually imagine some specific values other than your own.)
To be a little pedantic, the people alive 100 years ago and those alive today are for practical purposes disjoint. No individual’s values need have changed much. The old die, the young replace them, and the middle-aged blow with the wind between.