I don’t think I make this assumption. The biggest flaw in this post is that some of the definitions don’t quite make sense, and I don’t think assuming infinite compute helps this.
You don’t explicitly; it’s implicit in the following:
It is well known that a utility function over behaviors/policies is sufficient to describe any policy.
The VNM axioms do not necessarily apply for bounded agents. A bounded agent can rationally have preferences of the form A ~[1] B and B ~ C but A ≻[2] C, for instance[3]. You cannot describe this with a straight utility function.
I agree that a bounded agent can be VNM-incoherent and not have a utility function over bettable outcomes. Here I’m saying you can infer a utility function over behaviors for *any* agent with *any* behavior. You can trivially do this by setting the utility gained by every action the agent actually takes to 1, and utility of every action the agent doesn’t take to 0. For example for twitch-bot, the utility at each step is 1 if it twitches and 0 if it doesn’t.
My guess is if the randomness is pseudorandom, then 1 for the behavior it chose and 0 for everything else; if the randomness is true randomness and we use Boltzmann rationality then all behaviors are equal utility; if the randomness is true and the agent is actually maximizing, then the abstraction breaks down?
I want to clarify that this is not a particularly useful type of utility function, and the post was a mostly-failed attempt to make it useful.
I want to clarify that this is not a particularly useful type of utility function, and the post was a mostly-failed attempt to make it useful.
Fair! Here’s another[1] issue I think, now that I’ve realized you were talking about utility functions over behaviours, at least if you allow ‘true’ randomness.
Consider a slight variant of matching pennies: if an agent doesn’t make a choice, their choice is made randomly for them.
Now consider the following agents:
Twitchbot.
An agent that always plays (truly) randomly.
An agent that always plays the best Nash equilibrium, tiebroken by the choice that results in them making the most decisions. (And then tiebroken arbitrarily from there, not that it matters in this case.)
These all end up with infinite random sequences of plays, ~50% heads and ~50% tails[2][3][4]. And any infinite random (50%) sequence of plays could be a plausible sequence of plays for either of these agents. And yet these agents ‘should’ have different decompositions into w and g.
Maybe. Or maybe I was misconstruing what you meant by ‘if the randomness is true and the agent is actually maximizing, then the abstraction breaks down’ and this is the same issue you recognized.
‘The’ best Nash equilibrium is any combination of choosing 50⁄50 randomly, and/or not playing. The tiebreak means the best combination is playing 50⁄50.
I don’t think I make this assumption. The biggest flaw in this post is that some of the definitions don’t quite make sense, and I don’t think assuming infinite compute helps this.
You don’t explicitly; it’s implicit in the following:
The VNM axioms do not necessarily apply for bounded agents. A bounded agent can rationally have preferences of the form A ~[1] B and B ~ C but A ≻[2] C, for instance[3]. You cannot describe this with a straight utility function.
is indifferent to
is preferred over
See https://www.lesswrong.com/posts/AYSmTsRBchTdXFacS/on-expected-utility-part-3-vnm-separability-and-more?commentId=5DgQhNfzivzSdMf9o, which is similar but which does not cover this particular case. That being said, the same technique should ‘work’ here.
I agree that a bounded agent can be VNM-incoherent and not have a utility function over bettable outcomes. Here I’m saying you can infer a utility function over behaviors for *any* agent with *any* behavior. You can trivially do this by setting the utility gained by every action the agent actually takes to 1, and utility of every action the agent doesn’t take to 0. For example for twitch-bot, the utility at each step is 1 if it twitches and 0 if it doesn’t.
That’s a very different definition of utility function than I am used to. Interesting.
What would the utility function over behaviors for an agent that chose randomly at every timestep look like?
My guess is if the randomness is pseudorandom, then 1 for the behavior it chose and 0 for everything else; if the randomness is true randomness and we use Boltzmann rationality then all behaviors are equal utility; if the randomness is true and the agent is actually maximizing, then the abstraction breaks down?
I want to clarify that this is not a particularly useful type of utility function, and the post was a mostly-failed attempt to make it useful.
Fair! Here’s another[1] issue I think, now that I’ve realized you were talking about utility functions over behaviours, at least if you allow ‘true’ randomness.
Consider a slight variant of matching pennies: if an agent doesn’t make a choice, their choice is made randomly for them.
Now consider the following agents:
Twitchbot.
An agent that always plays (truly) randomly.
An agent that always plays the best Nash equilibrium, tiebroken by the choice that results in them making the most decisions. (And then tiebroken arbitrarily from there, not that it matters in this case.)
These all end up with infinite random sequences of plays, ~50% heads and ~50% tails[2][3][4]. And any infinite random (50%) sequence of plays could be a plausible sequence of plays for either of these agents. And yet these agents ‘should’ have different decompositions into w and g.
Maybe. Or maybe I was misconstruing what you meant by ‘if the randomness is true and the agent is actually maximizing, then the abstraction breaks down’ and this is the same issue you recognized.
Twitchbot doesn’t decide, so its decision is made randomly for it, so it’s 50⁄50.
The random agent decides randomly, so it’s 50⁄50.
‘The’ best Nash equilibrium is any combination of choosing 50⁄50 randomly, and/or not playing. The tiebreak means the best combination is playing 50⁄50.