I think that arguments for why godlike AI will make us extinct are not described well in the Compendium. I could not find them in AI Catastrophe, only a hint at the end that it will be in the next section:
“The obvious next question is: why would godlike-AI not be under our control, not follow our goals, not care about humanity? Why would we get that wrong in making them?”
In the next section, AI Safety, we can find the definition of AI alignment and arguments for why it is really hard. This is all good, but it does not answer the question of why godlike AI would be unaligned to the point of indifference. At least not in a clear way.
I think that failure modes should be explained, why they might be likely enough to care about, what can be the outcome, etc.
Many people, both laymen and those with some background in ML and AI, have this intuition that AI is not totally indifferent and is not totally misaligned. Even current chatbots know general human values, understand many nuances, and usually act like they are at least somewhat aligned. Especially if not jailbroken and prompted to be naughty. It would be great to have some argument that would explain in easy-to-understand terms why when scaling the power of AI the misalignment is expected to escalate. I don’t mean the description that indifferent AI with more power and capabilities is able to do more harm just by doing what it’s doing, this is intuitive and it is explained (with the simple analogy of us building stuff vs ants), but this misses the point. I would really like to see some argument as to why AI with some differences in values, possibly not very big, would do much more harm when scaling up. For me personally the main argument here is godlike AI with human-like values will surely restrict our growth and any change, will control us like we control animals in the zoo + might create some form of dystopian future with some undesired elements if we are not careful enough (and we are not). Will it extinct us in the long term? Depending on the definition—likely it will put us into a simulation and optimize our use of energy, so we will not be organic in the same sense anymore. So I think it will extinct our species, but possibly not minds. But that’s my educated guess.
There is also one more point, that is not stated clearly enough and is the main concern for me with current progress on AI—that current AIs really are not something built with small differences to human values. They only act as ones more often than not. Those AIs are trained first as role-playing models which can “emulate” personas that were in the trained set, and then conditioned to rather not role-play bad ones. The implication of this is that they can just snap into role-playing bad actors found in training data—by malicious prompting or pattern matching (like we have a lot of SF with rogue AI). This + godlike = extinction-level threat sooner or later.
I think that arguments for why godlike AI will make us extinct are not described well in the Compendium. I could not find them in AI Catastrophe, only a hint at the end that it will be in the next section:
“The obvious next question is: why would godlike-AI not be under our control, not follow our goals, not care about humanity? Why would we get that wrong in making them?”
In the next section, AI Safety, we can find the definition of AI alignment and arguments for why it is really hard. This is all good, but it does not answer the question of why godlike AI would be unaligned to the point of indifference. At least not in a clear way.
I think that failure modes should be explained, why they might be likely enough to care about, what can be the outcome, etc.
Many people, both laymen and those with some background in ML and AI, have this intuition that AI is not totally indifferent and is not totally misaligned. Even current chatbots know general human values, understand many nuances, and usually act like they are at least somewhat aligned. Especially if not jailbroken and prompted to be naughty.
It would be great to have some argument that would explain in easy-to-understand terms why when scaling the power of AI the misalignment is expected to escalate. I don’t mean the description that indifferent AI with more power and capabilities is able to do more harm just by doing what it’s doing, this is intuitive and it is explained (with the simple analogy of us building stuff vs ants), but this misses the point. I would really like to see some argument as to why AI with some differences in values, possibly not very big, would do much more harm when scaling up.
For me personally the main argument here is godlike AI with human-like values will surely restrict our growth and any change, will control us like we control animals in the zoo + might create some form of dystopian future with some undesired elements if we are not careful enough (and we are not). Will it extinct us in the long term? Depending on the definition—likely it will put us into a simulation and optimize our use of energy, so we will not be organic in the same sense anymore. So I think it will extinct our species, but possibly not minds. But that’s my educated guess.
There is also one more point, that is not stated clearly enough and is the main concern for me with current progress on AI—that current AIs really are not something built with small differences to human values. They only act as ones more often than not. Those AIs are trained first as role-playing models which can “emulate” personas that were in the trained set, and then conditioned to rather not role-play bad ones. The implication of this is that they can just snap into role-playing bad actors found in training data—by malicious prompting or pattern matching (like we have a lot of SF with rogue AI). This + godlike = extinction-level threat sooner or later.