An interesting point about the agency-as-retargetable-optimisation idea is that it seems like you can make the perturbation in various places upstream of the agent’s decision-making, but not downstream, i.e. you can retarget an agent by perturbing its sensors more easily than its actuators.
For example, to change a thermostat-controlled heating system to optimise for a higher temperature, the most natural perturbation might be to turn the temperature dial up, but you could also tamper with its thermistor so that it reports lower temperatures. On the other hand, making its heating element more powerful wouldn’t affect the final temperature.
I wonder if this suggests that an agent’s goal lives in the last place in a causal chain of things you can perturb to change the set of target states of the system.
An interesting point about the agency-as-retargetable-optimisation idea is that it seems like you can make the perturbation in various places upstream of the agent’s decision-making, but not downstream, i.e. you can retarget an agent by perturbing its sensors more easily than its actuators.
For example, to change a thermostat-controlled heating system to optimise for a higher temperature, the most natural perturbation might be to turn the temperature dial up, but you could also tamper with its thermistor so that it reports lower temperatures. On the other hand, making its heating element more powerful wouldn’t affect the final temperature.
I wonder if this suggests that an agent’s goal lives in the last place in a causal chain of things you can perturb to change the set of target states of the system.