Noosphere89 comments on Current AIs Provide Nearly No Data Relevant to AGI Alignment

Noosphere89 17 Dec 2023 0:41 UTC
4 points
0

“why do we think ‘future architectures’ will have property X, or whatever?!”.

This is the biggest problem with a lot of AI risk stuff, and it’s the gleeful assuming that AIs have certain properties, and it’s one of my biggest issues with the post, in that with a few exceptions, it assumes that real AGIs or future AGIs will confidently have certain properties, when there is not much reason to make the strong assumptions that Thane Ruthenis does on AI safety, and I’m annoyed by this occurring extremely often.
- ryan_greenblatt 17 Dec 2023 3:16 UTC
  9 points
  6
  Parent
  
  it assumes that real AGIs or future AGIs will confidently have certain properties like having deceptive alignment
  
  The post doesn’t claim AGIs will be deceptive aligned, it claims that AGIs will be capable of implementing deceptive alignment due to internally doing large amounts of consequentialist-y reasoning. This seems like a very different claim. This claim might also be false (for reasons I discuss in the second bullet point of this comment), but it’s importantly different and IMO much more defensible.
  - Noosphere89 17 Dec 2023 3:25 UTC
    5 points
    0
    Parent
    I was just wrong here, apparently, I misread what Thane Ruthenis is saying, and I’m not sure what to do with my comment up above.