It seems you are overlooking the notion of superintelligence being able to compute through your decisionmaking process backwards. Yes, it’s you who would be making the decision, but SI can tell you exactly what you need to hear in order for your decision to result in what it wants. It is not going to try to explain how it is manipulating you, it will not try to prove to you it is manipulating you correctly—it will just manipulate you. Internally, it may have a proof, but what reason would it have to show it to you? And if placed into some very constrained setup where it is forced to show you the proof, it will solve a recursive equation, of “What is the proof P, such that P proves that `’when shown P, you will act according to P’s prediction ″ ?”, solve it correctly, and then show you such P that it would be compelling enough for you to follow it to its conclusion.
Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It’s also an innocuous enough input that doesn’t obviously have to be filtered by weak agent’s membrane.
In any case, to even discuss how a weak agent behaves in a superintelligent world, it’s necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant.
Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).
Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs ⇒ ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.
It seems you are overlooking the notion of superintelligence being able to compute through your decisionmaking process backwards. Yes, it’s you who would be making the decision, but SI can tell you exactly what you need to hear in order for your decision to result in what it wants. It is not going to try to explain how it is manipulating you, it will not try to prove to you it is manipulating you correctly—it will just manipulate you. Internally, it may have a proof, but what reason would it have to show it to you? And if placed into some very constrained setup where it is forced to show you the proof, it will solve a recursive equation, of “What is the proof P, such that P proves that `’when shown P, you will act according to P’s prediction ″ ?”, solve it correctly, and then show you such P that it would be compelling enough for you to follow it to its conclusion.
Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It’s also an innocuous enough input that doesn’t obviously have to be filtered by weak agent’s membrane.
In any case, to even discuss how a weak agent behaves in a superintelligent world, it’s necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant.
Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).
Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs ⇒ ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.