JBlack comments on Is “Control” of a Superintelligence Possible?

JBlack 19 Apr 2022 4:44 UTC
3 points
The main distinction seems to be in the extent of how strongly these super-intelligent agents will use their power to influence human decision-making.
At one extreme end is total control, even in the most putatively aligned case: If my taking a sip of water from my glass at 10:04:22 am would be 0.000000001% better in some sense than sipping at 10:04:25 am, then it will arrange the inputs to my decision so that I take a sip of water at 10:04:22 am, and similarly for everything else that happens in the world. I do think that this would constitute a total loss of human control, though not necessarily a loss of human agency.
At the extreme other end would be something more like an Oracle, a superintelligent system (I hesitate to call it an agent) that has absolutely no preferences, including implied preferences, for the state of the world beyond some very narrow task.
Or to put it another way, how much slack will a superintelligence have in its implied preferences?
Concept 1 appears to be describing a superintelligence with no slack at all. Every human decision (and presumably everything else in the universe) must abide by a total strict order of preferences and it will optimize the hell out of those preferences. Concept 2 describes a superintelligence that may be designed to have—or be constrained to abide by—some slack in orderings of outcomes that depend upon human agency. Even if it can predict exactly what a human may decide, it doesn’t necessarily have to act so as to cause a preferred expected distribution of outcomes.
I don’t really think that we can rationally hold strong beliefs about where a future superintelligence might fall in this spectrum, or even outside it in some manner that I can’t imagine. I do think that the literal first scenario is infeasible even for a superintelligent agent, if it is constrained by anything like our current understanding of physical laws. I can imagine a superintelligence that acts in a manner that is as close to that as possible, and that this would drastically reduce human control even in the most aligned case.