ryan_greenblatt comments on MIRI 2024 Communications Strategy

ryan_greenblatt 31 May 2024 5:11 UTC
2 points
0
The difference between killing everyone and killing almost everyone while keeping a few alive for arcane purposes does not matter to most people, nor should it.

I basically agree with this as stated, but think these arguments also imply that it is reasonably likely that the vast majority of people will survive misaligned AI takeover (perhaps 50% likely).

I also don’t think this is very well described as arcane purposes:
- Kindness is pretty normal.
- Decision theory motivations is actually also pretty normal from some perspective: it’s just the generalization of relatively normal “if you wouldn’t have screwed me over and it’s cheap for me, I won’t screw you over”. (Of course, people typically don’t motivate this sort of thing in terms of decision theory so there is a bit of a midwit meme here.)
- Seth Herd 31 May 2024 18:38 UTC
  2 points
  0
  Parent
  You’re right. I didn’t mean to say that kindness is arcane. I was referring to acausal trade or other strange reasons to keep some humans around for possible future use.
  
  Kindness is normal in our world, but I wouldn’t assume it will exist in every or even most situations with intelligent beings. Humans are instinctively kind (except for sociopathic and sadistic people), because that is good game theory for our situation: interactions with peers, in which collaboration/teamwork is useful.
  
  A being capable of real recursive self-improvement, let alone duplication and creation of subordinate minds is not in that situation. They may temporarily be dealing with peers, but they might reasonably expect to have no need of collaborators in the near future. Thus, kindness isn’t rational for that type of being.
  
  The exception would be if they could make a firm commitment to kindness while they do have peers and need collaborators. They might have kindness merely as an instrumental goal, in which case it would be abandoned as soon as it was no longer useful.
  
  Or they might display kindness more instinctively, as a tendency in their thought or behavior. They might even have it engineered as an innate goal, as Steve hopes to engineer. In those last two cases, I think it’s possible that reflexive stability would keep that kindness in place as the AGI continued to grow, but I wouldn’t bet on it unless kindness was their central goal. If it was merely a tendency and not an explicit and therefore self-endorsed goal, I’d expect it to be dropped like the bad habit it effectively is. If it was an innate goal but not the strongest one, I don’t know but wouldn’t bet on it being long-term reflexively stable under deliberate self-modification.
  
  (As far as I know, nobody has tried hard to work through the logic of reflexive stability of multiple goals. I tried, and gave it up as too vague and less urgent than other alignment questions. My tentative answer was maybe multiple goals would be reflectively stable; it depends on the exact structure of the decision-making process in that AGI/mind).