I don’t understand? A super intelligence will of course not act as programed, nor as intended because to be the “super intelligent“ definition it will have emergent properties.
Plus “… instrumental incentives to manipulate or deceive its operators, and the system should not resist operator correction or shut- down.” Don’t act like any well adjusted 2 year old? If we really want intelligence running around, we are going to have to learn to let go of control.
Humans can barely value what’s in their scope. Intelligence can only value what’s in their scope because they are not “over there” and really can only follow best practices which might or might not work, and likely won’t work with outliers. We simply can’t take action “for the good of all humanity”, because we don’t know what’s good for everyone. We like to think we do, but we don’t. People used to think binding women’s feet was a good idea. Additionally, even if another takes our advice, only they experience the consequences of their actions: The feedback loop is broken: bureaucracy. This seems to be a persistent issue with AI. It is mathematically unsolvable: a local scope cannot know what’s best for a non local scope (without invoking omniscience. In practical terms, this is why projection of power is so expensive, why empires always fail, and why nature does not have empires.
There is a simple fix, but it requires scary thinking. Evolution obviously has intelligence: It made everything we are and experience. So just copy it. Like any other complex adaptive system it has a few simple initial conditions. https://www.castpoints.com/
If done correctly, we don’t get Skynet, we get another subset of evolution evolving.
Humans don’t understand intelligence. We, and computers, are not that intelligent. We mostly express evolution’s intelligence. That’s why people want to get into “flow” states.
The reason “the task of designing values and institutions is complicated by selection effects” is because that design is not very effective. Everyone makes this way to complicated. Life is a complex adaptive system: a few simple initial conditions iterating over time with feedback. The more integrated things are, the more, and more effective, emergent properties. As Alex Wissner-Gross and others suggest, you don’t really design for value: large value is an emergent property. Design the initial conditions. But we don’t have to do that: it’s already been done! All we have to do is recognize, then codify evolution’s initial conditions: Private property. Connections that are both tangible and intangible. Classification: Everything has a scope of relationships. It’s the classification that holds all the meta data. And add value first: Iteration http://wp.me/p4neeB-4Y