I don’t suppose you could clarify exactly how this agent that is twitching is defined. In particular, how does it accumulate over time? Do you get 1 utility for each point in time where you twitch and is your total utility the undiscounted sum of these utilities.
I am not defining this agent using a utility function. It turns out that because of coherence arguments and the particular construction I gave, I can view the agent as maximizing some expected utility.
I like Gurkenglas’s suggestion of a random number generator hooked up to motor controls, let’s go with that.
An agent that constantly twitches could still be a threat if it were trying to maximise the probability that it would actually twitch in the future. For example, if it were to break down, it wouldn’t be able to twitch, so it might want to gain control of resources.
Yeah, but it’s not trying to maximize that probability. I agree that a superintelligent agent that is trying to maximize the amount of twitching it does would be a threat, possibly by acquiring resources. But motor controls hooked up to random numbers certainly won’t do that.
If your robot powered by random numbers breaks down, it indeed will not twitch in the future. That’s fine, clearly it must have been maximizing a utility function that assigned utility 1 to it breaking at that exact moment in time. Jessica’s construction below would also work, but it’s specific to the case where you take the same action across all histories.
I am not defining this agent using a utility function. It turns out that because of coherence arguments and the particular construction I gave, I can view the agent as maximizing some expected utility.
I like Gurkenglas’s suggestion of a random number generator hooked up to motor controls, let’s go with that.
Yeah, but it’s not trying to maximize that probability. I agree that a superintelligent agent that is trying to maximize the amount of twitching it does would be a threat, possibly by acquiring resources. But motor controls hooked up to random numbers certainly won’t do that.
If your robot powered by random numbers breaks down, it indeed will not twitch in the future. That’s fine, clearly it must have been maximizing a utility function that assigned utility 1 to it breaking at that exact moment in time. Jessica’s construction below would also work, but it’s specific to the case where you take the same action across all histories.