DaemonicSigil comments on AGI Ruin: A List of Lethalities

DaemonicSigil 6 Jun 2022 4:40 UTC
22 points
3

Compared to what?

“Lethal” here means “lethal enough to kill every living human”. For example, later in the article Eliezer writes this:

When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1”

...

According to who?

From context, “has to” means “if we humans don’t solve this problem then we will be killed by an unaligned AI”. There’s no person/authority out there threatening us to solve this problem “or else”, that’s just the way that reality seems to be. If you’re trying to ask why does building a Strong unaligned AI result in everyone being killed, then I suggest reading the posts about orthogonality and instrumental convergence linked at the top of this post.

And why a binary choice?

“One way or another” is an English idiom which you can take to mean “somehow”. It doesn’t necessarily imply a binary choice.

Multi-multi scenarios existing in equilibrium have not been disproven yet,

This is addressed by #34: Just because multiple AI factions can coexist and compromise with each other doesn’t mean that any of those factions will be likely to want to keep humans around. It doesn’t seem likely that any AIs will think humans are cute and likeable in the same way that we think dogs are cute and likeable.

You need to disprove, or point to evidence that shows, why all ’easier modes’ proposals are incorrect

This is mostly addressed in #6 and #7, and the evidence given is that “nobody in this community has successfully named a ‘pivotal weak act’”. You could win this part of the argument by pointing out something that could be done with an AI weak enough not to be a threat which could prevent all the AI research groups out there in the world from building a Strong AI.

Why is it ‘fatal’?

Because we expect a Strong AI that hasn’t been aligned to kill everyone. Once again, see the posts about orthogonality and instrumental convergence.

And who determines what counts as the ‘first really dangerous try’?

I’m not quite sure what you’re asking here? I guess Eliezer determines what he meant by writing those words. I don’t think there’s anyone at any of these AI research groups looking over proposals for models and saying “oh this model is moderately dangerous” or “oh this model is really dangerous, you shouldn’t build it”. I think at most of those groups, they only worry about the cost to train the model rather than how dangerous it will be.
- M. Y. Zuo 8 Jun 2022 1:42 UTC
  2 points
  1
  Parent
  If you were unaware, every example in the parent of other types of ‘lethal’ has the possibility of eliminating all human life. And not in a hand wavey sense either, truly 100%, the same death rate as the worse case AGI outcomes.
  Which means that to a knowledgeable reader the wording is unpersuasive since the point is made before it’s been established there’s potential for an even worse outcome than 100% extinction.
  This shouldn’t be too hard to do since this topic was regularly discussed on LW… dust specks, and simulated tortures, etc.
  
  Idk why neither you or Eliezer include the obvious supporting points or links to someone who does beforehand, or at least not buried way past the assertion, since it seems you are trying to reinforce his points and Eliezer ostensibly wanted to write a summary to begin with for the non-expert reader.
  If there’s a new essay style that I didn’t get the memo about to put the weak arguments at the beginning and stronger ones near the end then I could see why it was written in such a way.
  For the rest of your points I see the same mistake of strong assertions without equally strong evidence to back it up.
  For example, none of the posts from the regulars I’ve seen on LW assert, without any hedging, that there’s a 100% chance of human extinction due to any arbitrary Strong AI.
  I’ve seen a few made that there’s a 100% chance Clippy would do such if Clippy arose first, though even those are somewhat iffy. And definitely none saying there’s a 100% chance Clippy, and only Clippy, would arise and reach an equilibrium end state.
  If you know of any such please provide the link.