I am very confused by this post, because it contradicts much of my understanding of human values and advanced agents.
First of all, I think that human values are context-dependent, not in some mysterious ways, but in pretty straightforward—we are social beings. Many of our values are actually “what others do”, “what my moral authority tells me to do”, “what cool kids do”, “what a better version of me would do” and changing-our-values-as-believes (in many aspects) is actually changing our believes about society in which we live. In other words, we have more or less solved the main problem of indirect normativity. And we can solve it (with respect of all alignment problem difficulty) for AI too, but it still will be “wrapper-mind”. The other part of out value drift happens because we don’t have strict social norms against many types of value drift and preservation of narrow values is quite costly for humans.
Second, wrapper-minds actually aren’t in state of permament war, because superintelligences don’t make such stupid mistakes. If papercilp-maximizer have a 60% chance of winning war against staple-maximizer, they just divide the universe between them in proportion to probabilities of victory.
On the other hand, why do you think we are not hostile to everything with sufficiently different values? One of the many lessons of “Three Worlds Collide” is “small differences in values between even non-wrapper minds can lead to destructive conflicts”.
Third, I think that when you talk about MIRI, you miss the fact that MIRI-folk are transhumanists. Many of their thoughts about superintelligences is about “What would I become if i dial my IQ up to 100000 and boost my cognitive reflectivity?”, and it seems to me that every possible coherent cognitive architecture will become indistinguishable from wrapper-mind for us.
Also, you can’t really expect to persuade superintelligence in something, even if it has moral uncertainity, because it has already thought about it, and you can’t provide some evidence unknown to it.
I am very confused by this post, because it contradicts much of my understanding of human values and advanced agents.
First of all, I think that human values are context-dependent, not in some mysterious ways, but in pretty straightforward—we are social beings. Many of our values are actually “what others do”, “what my moral authority tells me to do”, “what cool kids do”, “what a better version of me would do” and changing-our-values-as-believes (in many aspects) is actually changing our believes about society in which we live. In other words, we have more or less solved the main problem of indirect normativity. And we can solve it (with respect of all alignment problem difficulty) for AI too, but it still will be “wrapper-mind”. The other part of out value drift happens because we don’t have strict social norms against many types of value drift and preservation of narrow values is quite costly for humans.
Second, wrapper-minds actually aren’t in state of permament war, because superintelligences don’t make such stupid mistakes. If papercilp-maximizer have a 60% chance of winning war against staple-maximizer, they just divide the universe between them in proportion to probabilities of victory.
On the other hand, why do you think we are not hostile to everything with sufficiently different values? One of the many lessons of “Three Worlds Collide” is “small differences in values between even non-wrapper minds can lead to destructive conflicts”.
Third, I think that when you talk about MIRI, you miss the fact that MIRI-folk are transhumanists. Many of their thoughts about superintelligences is about “What would I become if i dial my IQ up to 100000 and boost my cognitive reflectivity?”, and it seems to me that every possible coherent cognitive architecture will become indistinguishable from wrapper-mind for us.
Also, you can’t really expect to persuade superintelligence in something, even if it has moral uncertainity, because it has already thought about it, and you can’t provide some evidence unknown to it.