Wei Dai comments on The Commitment Races problem

Wei Dai Jul 14, 2023, 8:58 PM
LW: 3 AF: 3
2
AF
I think that agents worthy of being called “rational” will probably handle all this stuff more gracefully/competently than humans do

Humans are kind of terrible at this right? Many give in even to threats (bluffs) conjured up by dumb memeplexes and back up by nothing (i.e., heaven/hell), popular films are full of heros giving in to threats, apparent majority of philosophers have 2-boxing intuitions (hence the popularity of CDT, which IIUC was invented specifically because some philosophers were unhappy with EDT choosing to 1-box), governments negotiate with terrorists pretty often, etc.

The sort of society AGIs construct will be at least as cooperatively-competent / good-at-coordinating-diverse-agents-with-diverse-agendas-and-beliefs as Dath Ilan.

If we build AGI that learn from humans or defer to humans on this stuff, do we not get human-like (in)competence?^[1]^[2] If humans are not atypical, large parts of the acausal society/economy could be similarly incompetent? I imagine there could be a top tier of “rational” superintelligences, built by civilizations that were especially clever or wise or lucky, that cooperate with each other (and exploit everyone else who can be exploited), but I disagree with this second quoted statement, which seems overly optimistic to me. (At least for now; maybe your unstated reasons to be optimistic will end up convincing me.)
1. ↩︎
  I can see two ways to improve upon this: 1) AI safety people seem to have better intuitions (cf popularity of 1-boxing among alignment researchers) and maybe can influence the development of AGI in a better direction, e.g., to learn from / defer to humans with intuitions more like themselves. 2) We figure out metaphilosophy, which lets AGI figure out how to improve upon humans. (ETA: However, conditioning on there not being a simple and elegant solution to decision theory also seems to make metaphilosophy being simple and elegant much less likely. So what would “figure out metaphilosophy” mean in that case?)
2. ↩︎
  I can also see the situation potentially being even worse, since many future threats will be very “out of distribution” for human evolution/history/intuitions/reasoning, so maybe we end up handling them even worse than current threats.
- Daniel Kokotajlo Jul 14, 2023, 9:30 PM
  LW: 2 AF: 2
  AF Parent
  Yes. Humans are pretty bad at this stuff, yet still, society exists and mostly functions. The risk is unacceptably high, which is why I’m prioritizing it, but still, by far the most likely outcome of AGIs taking over the world—if they are as competent at this stuff as humans are—is that they talk it over, squabble a bit, maybe get into a fight here and there, create & enforce some norms, and eventually create a stable government/society. But yeah also I think that AGIs will be by default way better than humans at this sort of stuff. I am worried about the “out of distibution” problem though, I expect humans to perform worse in the future than they perform in the present for this reason.
  
  Yes, some AGIs will be better than others at this, and presumably those that are worse will tend to lose out in various ways on average, similar to what happens in human society.
  
  Consider that in current human society, a majority of humans would probably pay ransoms to free loved ones being kidnapped. Yet kidnapping is not a major issue; it’s not like 10% of the population is getting kidnapped and paying ransoms every year. Instead, the governments of the world squash this sort of thing (well, except for failed states etc.) and do their own much more benign version, where you go to jail if you don’t pay taxes & follow the laws. When you say “the top tier of rational superintelligences exploits everyone else” I say that is analogous to “the most rational/clever/capable humans form an elite class which rules over and exploits the masses.” So I’m like yeah, kinda sorta I expect that to happen, but it’s typically not that bad? Also it would be much less bad if the average level of rationality/capability/etc. was higher?
  
  I’m not super confident in any of this to be clear.
  - Wei Dai Jul 14, 2023, 11:27 PM
    LW: 2 AF: 2
    AF Parent
    
    But yeah also I think that AGIs will be by default way better than humans at this sort of stuff.
    
    What’s your reasons for thinking this? (Sorry if you already explained this and I missed your point, but it doesn’t seem like you directly addressed my point that if AGIs learn from or defer to humans, they’ll be roughly human-level at this stuff?)
    
    When you say “the top tier of rational superintelligences exploits everyone else” I say that is analogous to “the most rational/clever/capable humans form an elite class which rules over and exploits the masses.” So I’m like yeah, kinda sorta I expect that to happen, but it’s typically not that bad?
    
    I think it could be much worse than current exploitation, because technological constraints prevent current exploiters from extracting full value from the exploited (have to keep them alive for labor, can’t make them too unhappy or they’ll rebel, monitoring for and repressing rebellions is costly). But with superintelligence and future/acausal threats, an exploiter can bypass all these problems by demanding that the exploited build an AGI aligned to itself and let it take over directly.
    - Daniel Kokotajlo Jul 15, 2023, 1:41 PM
      LW: 2 AF: 2
      AF Parent
      I agree that if AGIs defer to humans they’ll be roughly human-level, depending on which humans they are deferring to. If I condition on really nasty conflict happening as a result of how AGI goes on earth, a good chunk of my probability mass (and possibly the majority of it?) is this scenario. (Another big chunk, possibly bigger, is the “humans knowingly or unknowingly build naive consequentialists and let rip” scenario, which is scarier because it could be even worse than the average human, as far as I know). Like I said, I’m worried.
      
      If AGIs learn from humans though, well, it depends on how they learn, but in principle they could be superhuman.
      
      Re: analogy to current exploitation: Yes there are a bunch of differences which I am keen to study, such as that one. I’m more excited about research agendas that involve thinking through analogies like this than I am about what people interested in this topic seem to do by default, which is think about game theory and Nash bargaining and stuff like that. Though I do agree that both are useful and complementary.