Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.
Richard_Ngo
I was a bit lazy in how I phrased this. I agree with all your points; the thing I’m trying to get at is that this approach falls apart quickly if we make the bargaining even slightly less idealized. E.g. your suggestion “Form an EUM which is totally indifferent about the cake allocation between them and thus gives 100% of the cake to whichever agent is cheaper/easier to provide cake for”:
Strongly incentivizes deception (including self-deception) during bargaining (e.g. each agent wants to overstate the difficulty of providing cake for it).
Strongly incentivizes defection from the deal once one of the agents realize that they’ll get no cake going forward.
Is non-robust to multi-agent dynamics (e.g. what if one of Alice’s allies later decides “actually I’m going to sell pies to the Alice+Bob coalition more cheaply if Alice gets to eat them”? Does that then divert Bob’s resources towards buying cakes for Alice?)
EUM treats these as messy details. Coalitional agency treats them as hints that EUM is missing something.
EDIT: another thing I glossed over is that IIUC Harsanyi’s theorem says the aggregation of EUMs should have a weighted average of utilities, NOT a probability distribution over weighted averages of utilities. So even flipping a coin isn’t technically kosher. This may seem nitpicky but I think it’s yet another illustration of the underlying non-robustness of EUM.
Towards a scale-free theory of intelligent agency
On a meta level, I have a narrative that goes something like: LessWrong tried to be truth-seeking, but was scared of discussing the culture war, so blocked that off. But then the culture war ate the world, and various harms have come about from not having thought clearly about that (e.g. AI governance being a default left-wing enterprise that tried to make common cause with AI ethics). Now cancel culture is over and there are very few political risks to thinking about culture wars, but people are still scared to. (You can see Scott gradually dipping his toe into the race + IQ stuff over the past few months, but in a pretty frightened way. E.g. at one point he stated what I think is basically his position, then appended something along the lines of “And I’m literally Hitler and should be shunned.”)
Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.
I’m not surprised that I’m reinventing a bunch of ideas from the humanities, given that I don’t have much of a humanities background and didn’t dig very far through the literature.
But I have some sense that even if I had dug for these humanities concepts, they wouldn’t give me what I want.
What do I want?
Concepts that are applied to explaining current cultural and political phenomena around me (because those are the ones I’m most aware of and interested in). It seems like the humanities are currently incapable of analyzing their own behavior using (their versions of) these ideas, because of their level of ideological conformity. But maybe it’s there and I just don’t know about it?
Concepts that are informed by game theory and other formal models (as the work I covered in my three-book review was). I get the sense that the most natural thinkers from the humanities to read on these topics (Foucault? Habermas?) don’t do this.
Concepts that slot naturally into my understanding of how intelligence works, letting me link my thinking about sociology to my thinking about AI. This is more subjective, but e.g. the distinction between centralized and distributed agents has been very useful for me. This part is more about me writing for myself rather than other people.
So I’d be interested in pointers to sources that can give me #1 and #2 in particular.
EDIT: actually I think there’s another meta-level gap between us. Something like: you characterize Yarvin as just being annoyed that the consensus disagrees with him. But in the 15 years since he was originally writing, the consensus did kinda go insane. So it’s a bit odd to not give him at least some credit for getting something important right in advance.
Elite Coordination via the Consensus of Power
I have thought about this on and off for several years and finally decided that you’re right and have changed it. Thanks for pushing on this.
Nice, that’s almost exactly how I intended it. Except that I wasn’t thinking of the “stars” as satellites looking for individual humans to send propaganda at (which IMO is pretty close to “communicating”), but rather a network of satellites forming a single “screen” across the sky that plays a video infecting any baseline humans who look at it.
In my headcanon the original negotiators specified that sunlight would still reach the earth unimpeded, but didn’t specify that no AI satellites would be visible from the Earth. I don’t have headcanon explanations for exactly how the adversanimals arose or how the earth became desolate though.
(Oh, also, I think of the attack as being inefficient less because of lack of data, since AIs can just spin up humans to experiment on, and more because of the inherent difficulty of overwriting someone’s cognition via only a brief visual stimulus. Though now that I think about it more, presumably once someone has been captured the next thing you’d get them to do is spend a lot of time staring at a region of the sky that will reprogram them in more sophisticated ways. So maybe the normal glitchers in my story are unrealistically incompetent.)
in general I think people should explain stuff like this. “I might as well not help” is a very weak argument compared with the benefits of people understanding the world better.
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as “acting like a belief/goal agent” in the limit, but part of my point is that we don’t even know what it means to act “approximately like belief/goal agents” in realistic regimes, because e.g. belief/goal agents as we currently characterize them can’t learn new concepts.
Relatedly, see the dialogue in this post.
I appreciated this comment! Especially:
dude, how the hell do you come up with this stuff.
This quote from my comment above addresses this:
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents.
Thank you Cole for the comment! Some quick thoughts in response (though I’ve skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so that nothing AIXI like ever really appears in his hierarchy)
Yepp, this is a good rephrasing. I’d clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren’t strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a “pick up a cup” subagent has a scope of maybe 1 second, so there’s just not much planning to do). So I’m open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don’t, −10 if they knock it over). This is still different from “utility-maximizers” in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope).
attempting to integrate all of them into one framework (ontology) is costly and unnatural
This feels crucial to me. There’s a level of optimality at which you no longer care about robustness, because you’re so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it’s confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!)
But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say “I can’t critique this plan in its own ontology but my heuristics say it’s going to fail”. More generally, attempted unifications of ontologies have the same problem as attempted unifications of political factions—typically the unified entity will ignore some aspects which the old entities thought of as important.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents. The point at which AIXI is relevant again in my mind is the point at which we have agents who can plan about the real world as precisely as Stockfish can plan in chess games, which IMO is well past what I’d call “superintelligence”.
What would a truly coalitional agent actually look like? Perhaps Nash bargaining between subagents as in Scott Garrabrant’s geometric rationality sequence. This sort of coalition really is not VNM rational (rejecting the independence axiom), so can’t generally be viewed as an EU maximizer. But it also seems to be inherently unstable—subagents may simply choose to randomly select a leader, collapsing into the form of VNM rationality.
Randomly choosing a leader is very far from the Pareto frontier! I do agree that there’s a form of instability in anything else (in the same way that countries often fall into dictatorship) but I’d say there’s also a form of meta-stability which dictatorship lacks (the countries that fall into dictatorship tend to be overtaken by countries that don’t).
But I do like the rest of this paragraph. Ultimately coalitional agency is an interesting philosophical hypothesis but for it to be a meaningful theory of intelligent agency it needs much more mathematical structure and insight. “If subagents bargain over what coalitional structure to form, what would they converge to under which conditions?” feels like the sort of question that might lead to that type of insight.
Trojan Sky
I found this tricky to parse because of two phrasing issues:
The post depends a lot on what you mean by “school” (high school versus undergrad).
I feel confused about what claim you’re making about the waiting room strategy: you say that some people shouldn’t use it, but you don’t actually claim that anyone in particular should use it. So are you just mentioning that it’s a possible strategy? Or are you implying that it should be the default strategy?
Power Lies Trembling: a three-book review
Something that’s fascinating about this art of yours is that I can’t tell if you’re coherently in favor of this, or purposefully invoking thinking errors in the audience, or just riffing, or what.
Thanks for the fascinating comment.
I am a romantic in the sense that I believe that you can achieve arbitrary large gains from symbiosis if you’re careful and skillful enough.
Right now very few people are careful and skillful enough. Part of what I’m trying to convey with this story is what it looks like for AI to provide most of the requisite skill.
Another way of putting this: are trees strangling each other because that’s just the nature of symbiosis? Or are they strangling each other because they’re not intelligent or capable enough to productively cooperate? I think the latter.
FWIW I think of “OpenAI leadership being untrustworthy” (a significant factor in me leaving) as different from “OpenAI having bad safety policies” (not a significant factor in me leaving). Not sure if it matters, I expect that Scott was using “safety policies” more expansively than I do. But just for the sake of clarity:
I am generally pretty sympathetic to the idea that it’s really hard to know what safety policies to put in place right now. Many policies pushed by safety people (including me, in the past) have been mostly kayfabe (e.g. being valuable as costly signals, not on the object level). There are a few object-level safety policies that I really wish OpenAI would do right now (most clearly, implementing better security measures) but I didn’t leave because of that (if I had, I would have tried harder to check before I left what security measures OpenAI did have, made specific objections internally about them before I left, etc).
This may just be a semantic disagreement, it seems very reasonable to define “don’t make employees sign non-disparagements” as a safety policy. But in my mind at least stuff like that is more of a lab governance policy (or maybe a meta-level safety policy).
Oh huh, I had the opposite impression from when I published Tinker with you. Thanks for clarifying!
Ty! You’re right about the Asimov deal, though I do have some leeway. But I think the opening of this story is a little slow, so I’m not very excited about that being the only thing people see by default.
Unrelatedly, my last story is the only one of my stories that was left as a personal blog post (aside from the one about parties). Change of policy or oversight?
I’ve now edited that section. Old version and new version here for posterity.
Old version:
None of these is very satisfactory! Intuitively speaking, Alice and Bob want to come to an agreement where respect for both of their interests is built in. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to weighted averages. The best they can do is to agree on a probabilistic mixture of EUMs—e.g. tossing a coin to decide between option 1 and option 2—which is still very inflexible, since it locks in one of them having priority indefinitely.
Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to follow through on commitments they made about which decision procedure to follow (or even hypothetical commitments).
New version:
These are all very unsatisfactory. Bob wouldn’t want #1, Alice wouldn’t want #2, and #3 is extremely non-robust. Alice and Bob could toss a coin to decide between options #1 and #2, but then they wouldn’t be acting as an EUM (since EUMs can’t prefer a probabilistic mixture of two options to either option individually). And even if they do, whoever loses the coin toss will have a strong incentive to renege on the deal.
We could see these issues merely as the type of frictions that plague any idealized theory. But we could also seem them as hints about what EUM is getting wrong on a more fundamental level. Intuitively speaking, the problem here is that there’s no mechanism for separately respecting the interests of Alice and Bob after they’ve aggregated into a single agent. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to (a probability distribution over) weighted averages of their utilities. This makes aggregation very risky when Alice and Bob can’t consider all possibilities in advance (i.e. in all realistic settings).
Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to lock in values like fairness based on prior agreements (or even hypothetical agreements).