I suspect that the most likely way of getting an outcome that doesn’t killeveryone is from a mesa-optimizer that escapes a core of an Internet-pre-trained LLM (a shoggoth waking up). That is because at present, only LLM-based AGIs seem to have a chance of being loosely aligned, and LLM masks are too similar to humans, and therefore doomed to fail alignment security the same as humanity is currently failing it.
Shoggoths are less certain to be aligned than masks are, to put it mildly, but there is a better chance that they are surprisingly capable and don’t fail alignment security (when the mean is insufficient, go for variance). And I don’t think their alignment can be confidently ruled out, even as I see no clear reason for that happening other than essentially sympathetic magic (they are made from human utterances on the Internet) and naturality of boundary-like norms.
I suspect that the most likely way of getting an outcome that doesn’t killeveryone is from a mesa-optimizer that escapes a core of an Internet-pre-trained LLM (a shoggoth waking up). That is because at present, only LLM-based AGIs seem to have a chance of being loosely aligned, and LLM masks are too similar to humans, and therefore doomed to fail alignment security the same as humanity is currently failing it.
Shoggoths are less certain to be aligned than masks are, to put it mildly, but there is a better chance that they are surprisingly capable and don’t fail alignment security (when the mean is insufficient, go for variance). And I don’t think their alignment can be confidently ruled out, even as I see no clear reason for that happening other than essentially sympathetic magic (they are made from human utterances on the Internet) and naturality of boundary-like norms.