This is a common problem with a lot of these hypothetical AI scenarios—WHY does the Oracle do this? How did the process of constructing this AI somehow make it want to eventually cause some negative consequence?
The negative consequences come from the oracle implementing an optimisation algorithm with objective function ϕ which is not aligned with humans. The space of objectives ϕ′ which align with humans is incredibly small among all possible objectives, and very small differences get magnified when optimised against.
Even when objectives aren’t aligned, that doesn’t mean the outcome is literally death. No corporation I interact with us aligned with me, but in many/most cases I am still better off for being able to transact with them.
I think there are plenty of scenarios where “humanity continues to exist” has benefits for AI—we are a source of training data and probably lots of other useful resources, and letting us continue to exist is not a huge investment, since we are mostly self-sustaining. Maybe this isn’t literally “being aligned” but I think supporting human life has instrumental benefits to AI.
I think the formal claim is only true inasmuch as it’s also true that the space of all objectives that align with the AI’s continued existence is also incredibly small. I think it’s much less clear how many of the objectives that are in some way supportive of the AI also result in human extinction.
In fact, corporations are quite aligned with you. Not only because they are run by humans, who are at least roughly aligned with humanity by default, but we have legal institutions and social norms which help keep the wheels on the tracks. In fact the profit motive is a powerful alignment tool—it’s hard to make a profit off of humanity if they are all dead. But who aren’t corporations aligned with? Humans without money or legal protections for one (though we don’t need to veer off into an economic or political discussion). But also plants, insects, most animals. Some 60% of wild animals have died as a result of human activity over the past ~50 years alone. So, I think you’ve made a bit of a category error here: in the scenario where a superintelligence emerges, we are not a customer, we are wildlife.
Yes, there are definitely scenarios where human existence benefits an AI. But how many of those ensure our wellbeing? It’s just that there are certainly many more scenarios where they simply don’t care about us enough to actively preserve us. Insects are generally quite self sustaining and good for data too, but boy they sure get in the way when we want to build our cities or plant our crops.
I feel like this metaphor doesn’t strike me as accurate because humanity can engage in commerce and insects cannot.
But also humanity causes a lot of environmental degradation but we still don’t actually want to bring about the wholesale destruction of the environment.
What do you mean by “where the motivation comes from”?
This is a common problem with a lot of these hypothetical AI scenarios—WHY does the Oracle do this? How did the process of constructing this AI somehow make it want to eventually cause some negative consequence?
The negative consequences come from the oracle implementing an optimisation algorithm with objective function ϕ which is not aligned with humans. The space of objectives ϕ′ which align with humans is incredibly small among all possible objectives, and very small differences get magnified when optimised against.
I have a few objections here:
Even when objectives aren’t aligned, that doesn’t mean the outcome is literally death. No corporation I interact with us aligned with me, but in many/most cases I am still better off for being able to transact with them.
I think there are plenty of scenarios where “humanity continues to exist” has benefits for AI—we are a source of training data and probably lots of other useful resources, and letting us continue to exist is not a huge investment, since we are mostly self-sustaining. Maybe this isn’t literally “being aligned” but I think supporting human life has instrumental benefits to AI.
I think the formal claim is only true inasmuch as it’s also true that the space of all objectives that align with the AI’s continued existence is also incredibly small. I think it’s much less clear how many of the objectives that are in some way supportive of the AI also result in human extinction.
In fact, corporations are quite aligned with you. Not only because they are run by humans, who are at least roughly aligned with humanity by default, but we have legal institutions and social norms which help keep the wheels on the tracks. In fact the profit motive is a powerful alignment tool—it’s hard to make a profit off of humanity if they are all dead. But who aren’t corporations aligned with? Humans without money or legal protections for one (though we don’t need to veer off into an economic or political discussion). But also plants, insects, most animals. Some 60% of wild animals have died as a result of human activity over the past ~50 years alone. So, I think you’ve made a bit of a category error here: in the scenario where a superintelligence emerges, we are not a customer, we are wildlife.
Yes, there are definitely scenarios where human existence benefits an AI. But how many of those ensure our wellbeing? It’s just that there are certainly many more scenarios where they simply don’t care about us enough to actively preserve us. Insects are generally quite self sustaining and good for data too, but boy they sure get in the way when we want to build our cities or plant our crops.
I feel like this metaphor doesn’t strike me as accurate because humanity can engage in commerce and insects cannot.
But also humanity causes a lot of environmental degradation but we still don’t actually want to bring about the wholesale destruction of the environment.