Andrew_Critch comments on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Andrew_Critch 14 Apr 2021 16:01 UTC
LW: 12 AF: 6
AF
> Both [cultures A and B] are aiming to preserve human values, but within A, a subculture A’ develops to favor more efficient business practices (nihilistic power-maximizing) over preserving human values.
I was asking you why you thought A’ would effectively outcompete B (sorry for being unclear). For example, why do people with intrinsic interest in power-maximization outcompete people who are interested in human flourishing but still invest their money to have more influence in the future?
Ah! Yes, this is really getting to the crux of things. The short answer is that I’m worried about the following failure mode:
Failure mode: When B-cultured entities invest in “having more influence”, often the easiest way to do this will be for them to invest in or copy A’-cultured-entities/processes. This increases the total presence of A’-like processes in the world, which have many opportunities to coordinate because of their shared (power-maximizing) values. Moreover, the A’ culture has an incentive to trick the B culture(s) into thinking A’ will not take over the world, but eventually, A’ wins.
(Here’s, I’m using the word “culture” to encode a mix of information subsuming utility functions, beliefs, and decision theory, cognitive capacities, and other features determining the general tendencies of an agent or collective.)
Of course, an easy antidote to this failure mode is to have A or B win instead of A’, because A and B both have some human values other than power-maximizing. The problem is that this whole situation is premised on a conflict between A and B over which culture should win, and then the following observation applies:
- Wei Dai has suggested that groups with unified values might outcompete groups with heterogeneous values since homogeneous values allow for better coordination, and that AI may make this phenomenon more important.
In other words, the humans and human-aligned institutions not collectively being good enough at cooperation/bargaining risks a slow slipping-away of hard-to-express values and an easy takeover of simple-to-express values (e.g., power-maximization). This observation is slightly different from observations that “simple values dominate engineering efforts” as seen in stories about singleton paperclip maximizers. A key feature of the Production Web dynamic is now just that it’s easy to build production maximizers, but that it’s easy to accidentally cooperate on building a production-maximizing systems that destroy both you and your competitors.
This feels inconsistent with many of the things you are saying in your story, but
Thanks for noticing whatever you think are the inconsistencies; if you have time, I’d love for you to point them out.
I might be misunderstanding what you are saying and it could be that some argument like like Wei Dai’s is the best way to translate your concerns into my language.
This seems pretty likely to me. The bolded attribution to Dai above is a pretty important RAAP in my opinion, and it’s definitely a theme in the Production Web story as I intend it. Specifically, the subprocesses of each culture that are in charge of production-maximization end up cooperating really well with each other in a way that ends up collectively overwhelming the original (human) cultures. Throughout this, each cultural subprocess is doing what its “host culture” wants it to do from a unilateral perspective (work faster / keep up with the competitor cultures), but the overall effect is destruction of the host cultures (a la Prisoner’s Dilemma) by the cultural subprocesses.
If I had to use alignment language, I’d say “the production web overall is misaligned with human culture, while each part of the web is sufficiently well-aligned with the human entit(ies) who interact with it that it is allowed to continue operating”. Too low of a bar for “allowed to continue operating” is key to the failure mode, of course, and you and I might have different predictions about what bar humanity will actually end up using at roll-out time. I would agree, though, that conditional on a given roll-out date, improving E[alignment_tech_quality] on that date is good and complimentary to improving E[cooperation_tech_quality] on that date.
Did this get us any closer to agreement around the Production Web story? Or if not, would it help to focus on the aforementioned inconsistencies with homogenous-coordination-advantage?
- paulfchristiano 14 Apr 2021 17:30 UTC
  LW: 22 AF: 9
  AF Parent
  Failure mode: When B-cultured entities invest in “having more influence”, often the easiest way to do this will be for them to invest in or copy A’-cultured-entities/processes. This increases the total presence of A’-like processes in the world, which have many opportunities to coordinate because of their shared (power-maximizing) values. Moreover, the A’ culture has an incentive to trick the B culture(s) into thinking A’ will not take over the world, but eventually, A’ wins.
  I’m wondering why the easiest way is to copy A’—why was A’ better at acquiring influence in the first place, so that copying them or investing in them is a dominant strategy? I think I agree that once you’re at that point, A’ has an advantage.
  In other words, the humans and human-aligned institutions not collectively being good enough at cooperation/bargaining risks a slow slipping-away of hard-to-express values and an easy takeover of simple-to-express values (e.g., power-maximization).
  This doesn’t feel like other words to me, it feels like a totally different claim.
  Thanks for noticing whatever you think are the inconsistencies; if you have time, I’d love for you to point them out.
  In the production web story it sounds like the web is made out of different firms competing for profit and influence with each other, rather than a set of firms that are willing to leave profit on the table to benefit one another since they all share the value of maximizing production. For example, you talk about how selection drives this dynamic, but the firm that succeed are those that maximize their own profits and influence (not those that are willing to leave profit on the table to benefit other firms).
  So none of the concrete examples of Wei Dai’s economies of scale seem to actually seem to apply to give an advantage for the profit-maximizers in the production web. For example, natural monopolies in the production web wouldn’t charge each other marginal costs, they would charge profit-maximizing profits. And they won’t share infrastructure investments except by solving exactly the same bargaining problem as any other agents (since a firm that indiscriminately shared its infrastructure would get outcompeted). And so on.
  Specifically, the subprocesses of each culture that are in charge of production-maximization end up cooperating really well with each other in a way that ends up collectively overwhelming the original (human) cultures.
  This seems like a core claim (certainly if you are envisioning a scenario like the one Wei Dai describes), but I don’t yet understand why this happens.
  Suppose that the US and China both both have productive widget-industries. You seem to be saying that their widget-industries can coordinate with each other to create lots of widgets, and they will do this more effectively than the US and China can coordinate with each other.
  Could you give some concrete example of how the US widget industry and the Chinese widget industries coordinate with each other to make more widgets, and why this behavior is selected?
  For example, you might think that the Chinese and US widget industry share their insights into how to make widgets (as the aligned actors do in Wei Dai’s story), and that this will cause widget-making to do better than other non-widget sectors where such coordination is not possible. But I don’t see why they would do that—the US firms that share their insights freely with Chinese firms do worse, and would be selected against in every relevant sense, relative to firms that attempt to effectively monetize their insights. But effectively monetizing their insights is exactly what the US widget industry should do in order to benefit the US. So I see no reason why the widget industry would be more prone to sharing its insights
  So I don’t think that particular example works. I’m looking for an example of that form though, some concrete form of cooperation that the production-maximization subprocesses might engage in that allows them to overwhelm the original cultures, to give some indication for why you think this will happen in general.
  - Andrew_Critch 16 Apr 2021 0:28 UTC
    LW: 5 AF: 3
    AF Parent
    > Failure mode: When B-cultured entities invest in “having more influence”, often the easiest way to do this will be for them to invest in or copy A’-cultured-entities/processes. This increases the total presence of A’-like processes in the world, which have many opportunities to coordinate because of their shared (power-maximizing) values. Moreover, the A’ culture has an incentive to trick the B culture(s) into thinking A’ will not take over the world, but eventually, A’ wins.
    > In other words, the humans and human-aligned institutions not collectively being good enough at cooperation/bargaining risks a slow slipping-away of hard-to-express values and an easy takeover of simple-to-express values (e.g., power-maximization).
    This doesn’t feel like other words to me, it feels like a totally different claim.
    Hmm, perhaps this is indicative of a key misunderstanding.
    For example, natural monopolies in the production web wouldn’t charge each other marginal costs, they would charge profit-maximizing profits.
    Why not? The third paragraph of the story indicates that: “Companies closer to becoming fully automated achieve faster turnaround times, deal bandwidth, and creativity of negotiations.” In other words, at that point it could certainly happen that two monopolies would agree to charge each other lower cost if it benefitted both of them. (Unless you’d count that as instance of “charging profit-maximizing costs”?) The concern is that the subprocesses of each company/institution that get good at (or succeed at) bargaining with other institutions are subprocesses that (by virtue of being selected for speed and simplicity) are less aligned with human existence than the original overall company/institution, and that less-aligned subprocess grows to take over the institution, while always taking actions that are “good” for the host institution when viewed as a unilateral move in an uncoordinated game (hence passing as “aligned”).
    At this point, my plan is try to consolidate what I think the are main confusions in the comments of this post, into one or more new concepts to form the topic of a new post.
    - Ben Pace 16 Apr 2021 1:45 UTC
      LW: 6 AF: 4
      AF Parent
      At this point, my plan is try to consolidate what I think the are main confusions in the comments of this post, into one or more new concepts to form the topic of a new post.
      Sounds great! I was thinking myself about setting aside some time to write a summary of this comment section (as I see it).