Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It’s also an innocuous enough input that doesn’t obviously have to be filtered by weak agent’s membrane.
In any case, to even discuss how a weak agent behaves in a superintelligent world, it’s necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant.
Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).
Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs ⇒ ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.
Ability to resist a proof of what your behavior will be even to the point of refuting its formal correctness (by determining its incorrectness with your own decisions and turning the situation counterfactual) seems like a central example of a superintelligence being unable to decide/determine (as opposed to predict) what your decisions are. It’s also an innocuous enough input that doesn’t obviously have to be filtered by weak agent’s membrane.
In any case, to even discuss how a weak agent behaves in a superintelligent world, it’s necessary to have some notion of keeping it whole. Extreme manipulation can both warp the weak agent and fail to elicit their behavior for other possible inputs. So this response to another comment seems relevant.
Another way of stating this, drawing on the point about physical bodies thought of as simulations of some abstract formulation of a person, is to say that an agent by itself is defined by its own isolated abstract computation, which includes all membrane-permissible possible observations and resulting behaviors. Any physical implementation is then a simulation of this abstract computation, which can observe it to some extent, or fail to observe it (when the simulation gets sufficiently distorted). When an agent starts following dictates of external inputs, that corresponds to the abstract computation of the agent running other things within itself, which can be damaging to its future on that path of reflection depending on what those things are. In this framing, normal physical interaction with the external world becomes some kind of acausal interaction between the abstract agent-world (on inputs where the physical world is observed) and the physical world (for its parts that simulate the abstract agent-world).
Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs ⇒ ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.