If AIs’ goals are aligned with humans, it will be in the interest of the vast majority of AIs to coordinate to avoid harms from excessive competition. Which they would likely be good at, given that they are posited to be superhuman and humans currently do a decent job of mitigating such harms(at least to the extent that we haven’t destroyed the world)
I suppose what I’m trying to point to is some form of the outer alignment problem. I think we may end up with AIs that are aligned with human organizations like corporations more than individual humans. The reason for this is that corporations or militaries which employ more ruthless AIs will, over time, accrue more power and resources. It’s not so much explicit (i.e. violent) competition, but rather the gradual tendency for systems which are power-seeking and resource-maximizing to end up with more power and resources over time. If we allow for the creation / fine tuning of many AI agents, and allow them to accrue resources and copy themselves, then natural selection will favor the more selfish ones which are least aligned with humanity at large. We already require pretty extensive regulation to make sure that corporations don’t incur significant negative externalities, and these are organizations that are run by and composed of humans. When those entities are no longer humans, I think the vast majority of power and resources will no longer be explicitly controlled by humans, and moreover will be controlled by AI which has values poorly aligned with the majority of humans. The AI’s goals will only be aligned with the short-term interests of the small number of humans that created them in the first place. Once the majority of people realize that this system is not acting in their long-term interests, there will be nothing they can do about it.
A profit optimizing AI is not even a little bit aligned. It’s very important to make sure that everyone knows that if they create a being that’s smarter than them and tell it to optimize a number in a company bank account, the most likely outcome is that the agent hijacks democracy and does whatever it had to do to make sure the company (and the financial system) outlasts humanity.
Succeeding at the technical problem never looks like the scenarios you’re describing. If we had even a barely superhuman intelligence that was truly honest, and helpful (no intuitive sense of “harmless” is ultimately compatible with these things), it would be obvious to it that none of us want war, and it would tell us that.
The conversation about whether superintelligent agents would fall into molochean dynamics has been going on for a while. Personally, I think it’s pretty clear that coordination is just not that hard, humans are uniquely bad at it, but still impressively good at it. Even slightly more advanced agents (for instance, humans plus cheaper cameras and a better internet) could just make binding commitments to avoid wasteful conflict, especially conflict so destructive that it would force them all to carcinise.
That’s a very fair response. My claim here is really about the outer alignment problem, and that if lots of people have access to the ability to create / fine tune AI agents, many agents that have goals misaligned with humanity as a whole will be created, and we will lose control of the future.
I agree that humanity being overwhelmed by a swarm of misaligned AIs is a very conceivable scenario. We could all wake up one day and find our apps and devices chattering the equivalent of “skibidi, skibidi, skibidi...”
In the Less Wrong Sequences—which predate the current era of deep learning AI—there is an emphasis on the complexity of human values, and the need to capture all of that complexity in programmatic form, if AI civilization is to be a continuation of human civilization rather than a replacement of it.
Then with the rise of deep neural networks, suddenly we have complex quasi-AIs that can learn and even create, and the focus largely switched to how one gets such systems to truly learn anything at all. This has been the era of “alignment”.
I think the only real answer to your concern is to return to the earlier problem, of aligning an AI not just with the task of the moment, but with something akin to an ideal form of “human values”, something that will make it an autonomous ethical agent.
You may have heard of Coherent Extrapolated Volition (CEV). That stands for a solution to alignment at this level of civilizational ethics—instilling an AI with something that can serve as a humane foundation for an entire transhuman civilization. There are still people pursuing alignment in this sense, e.g. June Ku, Vanessa Kosoy, Tamsin Leake. That’s the best solution I have to your problem—ensure that each member of the AI swarm possesses CEV-type alignment, and/or that the swarm is governed by a singled CEV-aligned superintelligence.
Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.
I’ve also been thinking a lot about this recently and haven’t seen any explicit discussion of it. It’s the reason I recently began going through BlueDot Impact’s AI Governance course.
A couple questions, if you happen to know:
Is there anywhere else I can find discussion about what the transition to a post-superhuman-level-AI society might look like, on an object level? I also saw the FLI Worldbuilding Contest.
What are the implications for this on career choice for a early-career EA trying to make this transition go well?
If AIs’ goals are aligned with humans, it will be in the interest of the vast majority of AIs to coordinate to avoid harms from excessive competition. Which they would likely be good at, given that they are posited to be superhuman and humans currently do a decent job of mitigating such harms(at least to the extent that we haven’t destroyed the world)
I suppose what I’m trying to point to is some form of the outer alignment problem. I think we may end up with AIs that are aligned with human organizations like corporations more than individual humans. The reason for this is that corporations or militaries which employ more ruthless AIs will, over time, accrue more power and resources. It’s not so much explicit (i.e. violent) competition, but rather the gradual tendency for systems which are power-seeking and resource-maximizing to end up with more power and resources over time. If we allow for the creation / fine tuning of many AI agents, and allow them to accrue resources and copy themselves, then natural selection will favor the more selfish ones which are least aligned with humanity at large. We already require pretty extensive regulation to make sure that corporations don’t incur significant negative externalities, and these are organizations that are run by and composed of humans. When those entities are no longer humans, I think the vast majority of power and resources will no longer be explicitly controlled by humans, and moreover will be controlled by AI which has values poorly aligned with the majority of humans. The AI’s goals will only be aligned with the short-term interests of the small number of humans that created them in the first place. Once the majority of people realize that this system is not acting in their long-term interests, there will be nothing they can do about it.
A profit optimizing AI is not even a little bit aligned. It’s very important to make sure that everyone knows that if they create a being that’s smarter than them and tell it to optimize a number in a company bank account, the most likely outcome is that the agent hijacks democracy and does whatever it had to do to make sure the company (and the financial system) outlasts humanity.
Succeeding at the technical problem never looks like the scenarios you’re describing. If we had even a barely superhuman intelligence that was truly honest, and helpful (no intuitive sense of “harmless” is ultimately compatible with these things), it would be obvious to it that none of us want war, and it would tell us that.
The conversation about whether superintelligent agents would fall into molochean dynamics has been going on for a while.
Personally, I think it’s pretty clear that coordination is just not that hard, humans are uniquely bad at it, but still impressively good at it. Even slightly more advanced agents (for instance, humans plus cheaper cameras and a better internet) could just make binding commitments to avoid wasteful conflict, especially conflict so destructive that it would force them all to carcinise.
That’s a very fair response. My claim here is really about the outer alignment problem, and that if lots of people have access to the ability to create / fine tune AI agents, many agents that have goals misaligned with humanity as a whole will be created, and we will lose control of the future.
I agree that humanity being overwhelmed by a swarm of misaligned AIs is a very conceivable scenario. We could all wake up one day and find our apps and devices chattering the equivalent of “skibidi, skibidi, skibidi...”
In the Less Wrong Sequences—which predate the current era of deep learning AI—there is an emphasis on the complexity of human values, and the need to capture all of that complexity in programmatic form, if AI civilization is to be a continuation of human civilization rather than a replacement of it.
Then with the rise of deep neural networks, suddenly we have complex quasi-AIs that can learn and even create, and the focus largely switched to how one gets such systems to truly learn anything at all. This has been the era of “alignment”.
I think the only real answer to your concern is to return to the earlier problem, of aligning an AI not just with the task of the moment, but with something akin to an ideal form of “human values”, something that will make it an autonomous ethical agent.
You may have heard of Coherent Extrapolated Volition (CEV). That stands for a solution to alignment at this level of civilizational ethics—instilling an AI with something that can serve as a humane foundation for an entire transhuman civilization. There are still people pursuing alignment in this sense, e.g. June Ku, Vanessa Kosoy, Tamsin Leake. That’s the best solution I have to your problem—ensure that each member of the AI swarm possesses CEV-type alignment, and/or that the swarm is governed by a singled CEV-aligned superintelligence.
Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.
I’ve also been thinking a lot about this recently and haven’t seen any explicit discussion of it. It’s the reason I recently began going through BlueDot Impact’s AI Governance course.
A couple questions, if you happen to know:
Is there anywhere else I can find discussion about what the transition to a post-superhuman-level-AI society might look like, on an object level? I also saw the FLI Worldbuilding Contest.
What are the implications for this on career choice for a early-career EA trying to make this transition go well?