not all choices correspond to maximizing such a function—any time choices go in a circle, for instance, you’re not maximizing a function. We could imagine a very simple machine with a 3-state memory. It wants to go from A to B, and from B to C, and from C to A. Its choices are always a function if its internal state. But its choices don’t maximize a function of its internal state.
Here’s the corresponding utility function—assuming that state transitions are tied to actions.
If IAM(A) { U(A) = 0, U(B) = 1 U(C) = 0; }
If IAM(B) { U(A) = 0, U(B) = 0 U(C) = 1; }
If IAM(C) { U(A) = 1, U(B) = 0 U(C) = 0; }
Using simple maximisation algorithms (e.g. gradient descent) on that utility landscape will produce the behaviour in question. More sophisticted algorithms will do no better.
For one thing, the agent may not believe what you say.
Okay. Replace “offer it a choice” with “offer it a choice, and provide sufficient Bayesian evidence that this is this choice faced.” This doesn’t lead anywhere anyhow.
Your “BartlebeyBot” agent totally ignored Bayesian evidence. By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Any function of the internal state can be expressed with a number of entries equal to the number of possible internal states.
You’ve given me something that’s still interesting, which is all the expected utilities.
By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Because one maximizes a utility function, and the other just says “no” all the time.
Why do you think there’s a counter-example? Did you read the referenced Dewey paper about O-Maximisers?
Thank you for linking that again. Hm, I guess I did assume that agents could have different utilities at different timesteps. Just putting “1” for everything resolves how an O-maximizer can refuse the offer to raise its utility. But then, they assume that the tape of a turing machine is infinite, so the cycle above still is a problem.
Here’s the corresponding utility function—assuming that state transitions are tied to actions.
If IAM(A) { U(A) = 0, U(B) = 1 U(C) = 0; }
If IAM(B) { U(A) = 0, U(B) = 0 U(C) = 1; }
If IAM(C) { U(A) = 1, U(B) = 0 U(C) = 0; }
Using simple maximisation algorithms (e.g. gradient descent) on that utility landscape will produce the behaviour in question. More sophisticted algorithms will do no better.
Your “BartlebeyBot” agent totally ignored Bayesian evidence. By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Why do you think there’s a counter-example? Did you read the referenced Dewey paper about O-Maximisers?
Any function of the internal state can be expressed with a number of entries equal to the number of possible internal states.
You’ve given me something that’s still interesting, which is all the expected utilities.
Because one maximizes a utility function, and the other just says “no” all the time.
Thank you for linking that again. Hm, I guess I did assume that agents could have different utilities at different timesteps. Just putting “1” for everything resolves how an O-maximizer can refuse the offer to raise its utility. But then, they assume that the tape of a turing machine is infinite, so the cycle above still is a problem.