“why do we think ‘future architectures’ will have property X, or whatever?!”.
This is the biggest problem with a lot of AI risk stuff, and it’s the gleeful assuming that AIs have certain properties, and it’s one of my biggest issues with the post, in that with a few exceptions, it assumes that real AGIs or future AGIs will confidently have certain properties, when there is not much reason to make the strong assumptions that Thane Ruthenis does on AI safety, and I’m annoyed by this occurring extremely often.
it assumes that real AGIs or future AGIs will confidently have certain properties like having deceptive alignment
The post doesn’t claim AGIs will be deceptive aligned, it claims that AGIs will be capable of implementing deceptive alignment due to internally doing large amounts of consequentialist-y reasoning. This seems like a very different claim. This claim might also be false (for reasons I discuss in the second bullet point of this comment), but it’s importantly different and IMO much more defensible.
This is the biggest problem with a lot of AI risk stuff, and it’s the gleeful assuming that AIs have certain properties, and it’s one of my biggest issues with the post, in that with a few exceptions, it assumes that real AGIs or future AGIs will confidently have certain properties, when there is not much reason to make the strong assumptions that Thane Ruthenis does on AI safety, and I’m annoyed by this occurring extremely often.
The post doesn’t claim AGIs will be deceptive aligned, it claims that AGIs will be capable of implementing deceptive alignment due to internally doing large amounts of consequentialist-y reasoning. This seems like a very different claim. This claim might also be false (for reasons I discuss in the second bullet point of this comment), but it’s importantly different and IMO much more defensible.
I was just wrong here, apparently, I misread what Thane Ruthenis is saying, and I’m not sure what to do with my comment up above.