adamShimi comments on Generalizing POWER to multi-agent games

adamShimi 27 Mar 2021 17:22 UTC
LW: 17 AF: 9
AF
Thanks for the detailed reply!
I want to go a bit deeper into the fine points, but my general reaction is “I wanted that in the post”. You make a pretty good case for a way to come around at this definition that makes it particularly exciting. On the other hand, I don’t think that stating a definition and proving a single theorem that has the “obvious” quality (whether or not it is actually obvious, mind you) is that convincing.
The best way to describe my interpretation is that I feel that you two went for the “scientific paper” style, but the current state of the research, as well as the argument for its value, fit more the “here’s-a-cool-formal-idea blogpost or workshop paper”. And that’s independently of the importance of the result. To say it again differently, I’m ready to accept the importance of a formalism without much explanations of why I should care if it shows a lot of cool results, but when the results are few, I need a more detailed story of why I should care.
About your specific story now:
Coming off of Optimal Policies Tend to Seek Power last summer, I felt like I understood single-agent Power reasonably well (at that point in time, I had already dropped the assumption of optimality). Last summer, “understand multi-agent power” was actually the project I intended to work on under Andrew Critch. I ended up understanding defection instead (and how it wasn’t necessarily related to Power-seeking), and corrigibility-like properties, and further expanding the single-agent results. But I was still pretty confused about the multi-agent situation.
Nothing to say here, except that you have the frustrating (for me) ability to make me want to read 5 of your posts in detail when explaining something completely different. I am also supposed to make my own research, you know? (Related: I’ll be excited with reviewing one of your post with the review project we’re doing with a bunch of other researchers. Not sure what post of you would be most appropriate though. If you have some idea, you can post it here. ;) )
The crux was, in an MDP, you’ve got a state, and it’s pretty clear what an agent can do. But in the multi-agent case, now you’ve got other reasoners, and now you have to account for their influence. So at first I thought,
maybe Power is about being able to enforce your will even against the best efforts of the other players
which would correspond to everyone else minmax-ing you on any goal you chose. But this wasn’t quite right. I thought about this for a while, and I didn’t make much progress, and somehow I didn’t come up with the formalism in this post until this winter when I started working with Jacob. In hindsight, maybe it’s obvious:
- in an MDP, the relevant “situation” is the current state; measure the agent’s average optimal value at that state.
- in a non-iterated multi-agent game, the relevant “situation” is just the other players’ strategy profile; measure your average maximum reward, assuming everyone else follows the strategy profile.
  This should extend naturally into Bayesian stochastic games, to account for sequential decision-making and truly generalize the MDP results.
When phrased that way, I think my “issue” is that the subtlety you add is mostly hidden within the additional parameter of the strategy profile. That is, with the original intuition, you don’t have to find out what the other players will actually do; here you kind of have to. It’s a good thing as I agree with you that it makes the intuition subtler, but it also creates a whole new complex problem of inferring strategies.
At this point, I went to reread the last sections, and realized that you’re partially dealing with my problem by linking power with well-known strategy profiles (the nash-equilibriums).
But for me, I was excited about the Power formalism when (IIRC) I proposed to Jacob that we prove results about that formalism. Jacob was the one who formulated the theorem, and I actually didn’t buy it at first; my naive intuition was that Power should always be constant when summed over players who have their types drawn from the constant-sum distribution. This was wrong, so I was pretty surprised.
This part pushed me to reread the statements in detail. If I get it correctly, you had the intuition that the power behaved like “will this player win”, whereas it actually work as “keeping everything else fixed, how well can this player end up”. The trick that makes the theorem true and the power bigger than the sum is that for a strategy profile that isn’t a nash equilibrium, multiple players might get a lot if they change their action in turn while keeping everything else fixed.
I’m a bit ashamed, because that’s actually explained in the intuition of the proof, but I didn’t get it on the first reading. I also see now that it was the point of the discussion before the theorem, but that part flew over my head. So my advice for this would be to explain even more in detail the initial intuition and why it is wrong, including where in the maths this happens (the fixing of $σ_{- i}$ ).
My updated take after getting this point is that I’m a bit more excited about your formalism.
But the thing I’m most excited about is how I had this intuitive picture of “if your goals are unaligned, then in worlds like ours, one person gaining power means other people must lose power, after ‘some point’.”
Intuitively this seems obvious, just like the community knew about instrumental convergence before my formal results. But I’m excited now that we can prove the intuitively correct conclusion, using a notion of Power that mirrors the one used in the single-agent case for the existing power-seeking results. And this wasn’t obvious to me, at least.
I agree that this is exciting, but this is only mentioned in the last line of the post, as one perspective among others. Notably, it wasn’t clear at all that this was the main application of this work.