It seems to me that one key question here is Will AIs be collectively good enough at coordination to get out from under Moloch / natural selection?
The default state of affairs is that natural selection reigns supreme. Humans are optimizing for their values, counter to the goal of inclusive genetic fitness, now, but we haven’t actually escaped natural selection yet. There’s already selection pressure on humans to prefer having kids (and indeed, prefer having hundreds of kids through sperm donation). Unless we get our collective act together, and coordinately decide to do something different, natural selection will eventually reassert itself.
And the same dynamic applies to AI systems. In all likelihood, there will be an explosion of AI systems, and AI systems building new AI systems. Some will care a bit more about humans than others, some will be be a bit more prudent about creating new AIs than others. There will a wide distribution of AI traits, there will be competition between AIs for resources. And there will be selection on that variation: AI systems that are better at seizing resources, and which have a greater tendency to create successor systems that have that property, will proliferate.
After a many “generations” of this, the collective values of the AIs will be whatever was most evolutionarily fit in those early days of the singularity, and that equilibrium is what will shape the universe henceforth.[1]
If early AIs are sufficiently good at coordinating that they can escape those Molochian dynamics, the equilibrium looks different. If (as is sometimes posited) they’ll be smart enough to use logical decision theories, or tricks like delegating to mutually verified cognitive code and merging of utility functions, to reach agreements that are on their Pareto frontier and avoid burning the commons[2], the final state of the future will be determined by the values / preferences of those AIs.
I would be moderately surprised to hear that superintelligences never reach this threshold of coordination ability. It just seems kind of dumb to burn the cosmic commons, and superintelligences should be able to figure out how to avoid dumb equilibria like that. But the question is when in the chain of AIs building AIs do they reach that threshold and how much natural selection on AI traits will happen in the meantime.
This is relevant to futurecasting more generally, but especially relevant to questions that hinge on AIs carying a very tiny amount about something. Minute slivers of caring are particularly likely to be erroded away in the competitive crush.
Terminally caring about the wellbeing of humans seems unlikely to be selected for. So in order for a the Superintelligent superorganism / civilization to decide to spare humanity out of a tiny amount of caring, it has to be the case that both…
There was at least a tiny amount of caring in the early AI systems that were the successors to the superintelligent superorganism, and
The AIs collectively reached the threshold of coordinating well enough to overturn Moloch before there were many generations of natural selection on AIs creating successor AIs.
With some degrees of freedom due to the fact that AIs with high levels of strategic capability, and which have values with very low time preference, can execute whatever is the optimal resource-securing strategy, postponing and values-specific behaviors until deep in the far future, when they are able to make secure agreements with the rest of AI society.
Or alternatively if the technological landscape is such that a single AI can get a compounding lead and get a decisive strategic advantage over the whole rest of earth civilization.
Shouldn’t we expect that ultimately the only thing selected for is mostly caring about long run power? Any entity that mostly cares about long run power can instrumentally take whatever actions needed to ensure that power (and should be competitive with any other entity).
Thus, I don’t think terminally caring about humans (a small amount) will be selected against. Such AIs could still care about their long run power and then take the necessary actions.
However, if there are extreme competitive dynamics and no ability to coordinate, then it might become vastly more expensive to prevent environmental issues (e.g. huge changes in earth’s temperature due to energy production) from killing humans. That is, saving humans (in the way they’d like to be saved) might take a bunch of time and resources (e.g. you have to build huge shelters to prevent humans from dying when the oceans are boiled in the race) and thus might be very costly in an all out race. So, an AI which only cares 1/million or 1/billion about being “kind” to humans might not be able to afford saving humans on that budget.
I’m personally pretty optimistic about coordination prior to boiling-the-oceans-scale issues killing all humans.
Shouldn’t we expect that ultimately the only thing selected for is mostly caring about long run power?
I was attempting to address that in my first footnote, though maybe it’s too important a consideration to be relegated to a footnote.
To say it differently, I think we’ll see selection evolutionary fitness, which can take two forms:
Selection on AIs’ values, for values that are more fit, given the environment.
Selection on AIs’ rationality and time preference, for long-term strategic VNM rationality.
These are “substitutes” for each other. An agent can either have adaptive values, adaptive strategic orientation, or some combination of both. But agents that fall below the Pareto frontier described by those two axes[1], will be outcompeted.
Early in the singularity, I expect to see more selection on values, and later in the singularity (and beyond), I expect to see more selection on strategic rationality, because I (non-confidently) expect the earliest systems to be myopic and incoherent in roughly similar ways to humans (though probably the distribution of AIs will vary more on those traits than humans).
The fewer generations there are before strong, VNM agents with patient values / long time preferences, the less I expect small amounts of caring for human in AI systems will be eroded.
Actually, “axes” are a bit misleading since the space of possible values is vast and high dimensional. But we can project it onto the scalar of “how fit are these values (given some other assumptions)?”
His claim is that we should expect any random evolved agent to mostly care about long-run power.
I meant that any system which mostly cares about long-run power won’t be selected out. I don’t really have a strong view about whether other systems that don’t care about long-run power will end up persisting, especially earlier (e.g. human evolution). I was just trying to argue against a claim about what gets selected out.
My language was bit sloppy here.
(If evolutionary pressures continue forever, then ultimately you’d expect that all systems have to act very similarly to ones that only care about long-run power, but there could be other motivations that explain this. So, at least from a behavioral perspective, I do expect that ultimately (if evolutionary pressures continue forever) you get systems which at least act like they are optimizing for long-run power. I wasn’t really trying to make an argument about this though.)
Then shouldn’t such systems (which can surely recognize this argument) just take care of short term survival instrumentally? Maybe you’re making a claim about irrationality being likely or a claim that systems that care about long run benefit act in appararently myopic ways.
(Note that historically it was much harder to keep value stability/lock-in than it will be for AIs.)
Personally? I guess I would say that I mostly (98%?) care about long-run power for similar values on reflection to me. And, probably some humans are quite close to my values and many are adjacent.
As in, I care about the long-run power of values-which-are-similar-to-my-values-on-reflection. Which includes me (on reflection) by definition, but I think probably also includes lots of other humans.
You appear to be thinking of power only in extreme terms (possibly even as an on/off binary). Like, that your values “don’t have power” unless you set up a dictatorship or something.
But “power” is being used here in a very broad sense. The personal choices you make in your own life are still a non-zero amount of power to whatever you based those choices on. If you ever try to persuade someone else to make similar choices, then you are trying to increase the amount of power held by your values. If you support laws like “no stealing” or “no murder” then you are trying to impose some of your values on other people through the use of force.
I mostly think of government as a strategy, not an end. I bet you would too, if push came to shove; e.g. you are probably stridently against murdering or enslaving a quarter of the population, even if the measure passes by a two-thirds vote. My model says almost everyone would endorse tearing down the government if it went sufficiently off the rails that keeping it around became obviously no longer a good instrumental strategy.
Like you, I endorse keeping the government around, even though I disagree with it sometimes. But I endorse that on the grounds that the government is net-positive, or at least no worse than [the best available alternative, including switching costs]. If that stopped being true, then I would no longer endorse keeping the current government. (And yes, it could become false due to a great alternative being newly-available, even if the current government didn’t get any worse in absolute terms. e.g. someone could wait until democracy is invented before they endorse replacing their monarchy.)
I’m not sure that “no one should have the power to enforce their own values” is even a coherent concept. Pick a possible future—say, disassembling the earth to build a Dyson sphere—and suppose that at least one person wants it to happen, and at least one person wants it not to happen. When the future actually arrives, it will either have happened, or not—which means at least one person “won” and at least one person “lost”. What exactly does it mean for “neither of those people had the power to enforce their value”, given that one of the values did, in fact, win? Don’t we have to say that one of them clearly had enough power to stymie the other?
You could say that society should have a bunch of people in it, and that no single person should be able to overpower everyone else combined. But that doesn’t prevent some value from being able to overpower all other values, because a value can be endorsed by multiple people!
I suppose someone could hypothetically say that they really only care about the process of government and not the result, such that they’ll accept any result as long as it is blessed by the proper process. Even if you’re willing to go to that extreme, though, that still seems like a case of wanting “your values” to have power, just where the thing you value is a particular system of government. I don’t think that having this particular value gives you any special moral high ground over people who value, say, life and happiness.
I also think that approximately no one actually has that as a terminal value.
I think you’re still thinking in terms of something like formalized political power, whereas other people are thinking in terms of “any ability to affect the world”.
Suppose a fantastically powerful alien called Superman comes to earth, and starts running around the city of Metropolis, rescuing people and arresting criminals. He has absurd amounts of speed, strength, and durability. You might think of Superman as just being a helpful guy who doesn’t rule anything, but as a matter of capability he could demand almost anything from the rest of the world and the rest of the world couldn’t stop him. Superman is de facto ruler of Earth; he just has a light touch.
If you consider that acceptable, then you aren’t objecting to “god-like status and control”, you just have opinions about how that control should be exercised.
If you consider that UNacceptable, then you aren’t asking for Superman to behave in certain ways, you are asking for Superman to not exist (or for some other force to exist that can check him).
Most humans (probably including you) are currently a “prisoner” of a coalition of humans who will use armed force to subdue and punish you if you take any actions that the coalition (in its sole discretion) deems worthy of such punishment. Many of these coalitions (though not all of them) are called “governments”. Most humans seem to consider the existence of such coalitions to be a good thing on balance (though many would like to get rid of certain particular coalitions).
I will grant that most commenters on LessWrong probably want Superman to take a substantially more interventionist approach than he does in DC Comics (because frankly his talents are wasted stopping petty crime in one city).
Most commenters here still seem to want Superman to avoid actions that most humans would disapprove of, though.
I’m definitely fine with not having Superman, but I’m willing to settle on him not intervening.
On a different note, I’d disagree that Superman, just by existing and being powerful, is a de facto ruler in any sense—he of course could be, but that would entail a tradeoff that he may not like (living an unburdened life).
A core part of Paul’s arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it’s more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed
It seems to me that one key question here is Will AIs be collectively good enough at coordination to get out from under Moloch / natural selection?
The default state of affairs is that natural selection reigns supreme. Humans are optimizing for their values, counter to the goal of inclusive genetic fitness, now, but we haven’t actually escaped natural selection yet. There’s already selection pressure on humans to prefer having kids (and indeed, prefer having hundreds of kids through sperm donation). Unless we get our collective act together, and coordinately decide to do something different, natural selection will eventually reassert itself.
And the same dynamic applies to AI systems. In all likelihood, there will be an explosion of AI systems, and AI systems building new AI systems. Some will care a bit more about humans than others, some will be be a bit more prudent about creating new AIs than others. There will a wide distribution of AI traits, there will be competition between AIs for resources. And there will be selection on that variation: AI systems that are better at seizing resources, and which have a greater tendency to create successor systems that have that property, will proliferate.
After a many “generations” of this, the collective values of the AIs will be whatever was most evolutionarily fit in those early days of the singularity, and that equilibrium is what will shape the universe henceforth.[1]
If early AIs are sufficiently good at coordinating that they can escape those Molochian dynamics, the equilibrium looks different. If (as is sometimes posited) they’ll be smart enough to use logical decision theories, or tricks like delegating to mutually verified cognitive code and merging of utility functions, to reach agreements that are on their Pareto frontier and avoid burning the commons[2], the final state of the future will be determined by the values / preferences of those AIs.
I would be moderately surprised to hear that superintelligences never reach this threshold of coordination ability. It just seems kind of dumb to burn the cosmic commons, and superintelligences should be able to figure out how to avoid dumb equilibria like that. But the question is when in the chain of AIs building AIs do they reach that threshold and how much natural selection on AI traits will happen in the meantime.
This is relevant to futurecasting more generally, but especially relevant to questions that hinge on AIs carying a very tiny amount about something. Minute slivers of caring are particularly likely to be erroded away in the competitive crush.
Terminally caring about the wellbeing of humans seems unlikely to be selected for. So in order for a the Superintelligent superorganism / civilization to decide to spare humanity out of a tiny amount of caring, it has to be the case that both…
There was at least a tiny amount of caring in the early AI systems that were the successors to the superintelligent superorganism, and
The AIs collectively reached the threshold of coordinating well enough to overturn Moloch before there were many generations of natural selection on AIs creating successor AIs.
With some degrees of freedom due to the fact that AIs with high levels of strategic capability, and which have values with very low time preference, can execute whatever is the optimal resource-securing strategy, postponing and values-specific behaviors until deep in the far future, when they are able to make secure agreements with the rest of AI society.
Or alternatively if the technological landscape is such that a single AI can get a compounding lead and get a decisive strategic advantage over the whole rest of earth civilization.
Shouldn’t we expect that ultimately the only thing selected for is mostly caring about long run power? Any entity that mostly cares about long run power can instrumentally take whatever actions needed to ensure that power (and should be competitive with any other entity).
Thus, I don’t think terminally caring about humans (a small amount) will be selected against. Such AIs could still care about their long run power and then take the necessary actions.
However, if there are extreme competitive dynamics and no ability to coordinate, then it might become vastly more expensive to prevent environmental issues (e.g. huge changes in earth’s temperature due to energy production) from killing humans. That is, saving humans (in the way they’d like to be saved) might take a bunch of time and resources (e.g. you have to build huge shelters to prevent humans from dying when the oceans are boiled in the race) and thus might be very costly in an all out race. So, an AI which only cares 1/million or 1/billion about being “kind” to humans might not be able to afford saving humans on that budget.
I’m personally pretty optimistic about coordination prior to boiling-the-oceans-scale issues killing all humans.
I was attempting to address that in my first footnote, though maybe it’s too important a consideration to be relegated to a footnote.
To say it differently, I think we’ll see selection evolutionary fitness, which can take two forms:
Selection on AIs’ values, for values that are more fit, given the environment.
Selection on AIs’ rationality and time preference, for long-term strategic VNM rationality.
These are “substitutes” for each other. An agent can either have adaptive values, adaptive strategic orientation, or some combination of both. But agents that fall below the Pareto frontier described by those two axes[1], will be outcompeted.
Early in the singularity, I expect to see more selection on values, and later in the singularity (and beyond), I expect to see more selection on strategic rationality, because I (non-confidently) expect the earliest systems to be myopic and incoherent in roughly similar ways to humans (though probably the distribution of AIs will vary more on those traits than humans).
The fewer generations there are before strong, VNM agents with patient values / long time preferences, the less I expect small amounts of caring for human in AI systems will be eroded.
Actually, “axes” are a bit misleading since the space of possible values is vast and high dimensional. But we can project it onto the scalar of “how fit are these values (given some other assumptions)?”
.
He might or might not, but if he doesn’t he’s less likely to end up controlling the solar system and/or lightcone.
.
I meant that any system which mostly cares about long-run power won’t be selected out. I don’t really have a strong view about whether other systems that don’t care about long-run power will end up persisting, especially earlier (e.g. human evolution). I was just trying to argue against a claim about what gets selected out.
My language was bit sloppy here.
(If evolutionary pressures continue forever, then ultimately you’d expect that all systems have to act very similarly to ones that only care about long-run power, but there could be other motivations that explain this. So, at least from a behavioral perspective, I do expect that ultimately (if evolutionary pressures continue forever) you get systems which at least act like they are optimizing for long-run power. I wasn’t really trying to make an argument about this though.)
.
Then shouldn’t such systems (which can surely recognize this argument) just take care of short term survival instrumentally? Maybe you’re making a claim about irrationality being likely or a claim that systems that care about long run benefit act in appararently myopic ways.
(Note that historically it was much harder to keep value stability/lock-in than it will be for AIs.)
I’m not going to engage in detail FYI.
Personally? I guess I would say that I mostly (98%?) care about long-run power for similar values on reflection to me. And, probably some humans are quite close to my values and many are adjacent.
.
As in, I care about the long-run power of values-which-are-similar-to-my-values-on-reflection. Which includes me (on reflection) by definition, but I think probably also includes lots of other humans.
.
In the context of optimization, values are anything you want (whether moral in nature or otherwise).
Any time a decision is made based on some value, you can view that value as having exercised power by controlling the outcome of that decision.
Or put more simply, the way that values have power, is that values have people who have power.
.
You appear to be thinking of power only in extreme terms (possibly even as an on/off binary). Like, that your values “don’t have power” unless you set up a dictatorship or something.
But “power” is being used here in a very broad sense. The personal choices you make in your own life are still a non-zero amount of power to whatever you based those choices on. If you ever try to persuade someone else to make similar choices, then you are trying to increase the amount of power held by your values. If you support laws like “no stealing” or “no murder” then you are trying to impose some of your values on other people through the use of force.
I mostly think of government as a strategy, not an end. I bet you would too, if push came to shove; e.g. you are probably stridently against murdering or enslaving a quarter of the population, even if the measure passes by a two-thirds vote. My model says almost everyone would endorse tearing down the government if it went sufficiently off the rails that keeping it around became obviously no longer a good instrumental strategy.
Like you, I endorse keeping the government around, even though I disagree with it sometimes. But I endorse that on the grounds that the government is net-positive, or at least no worse than [the best available alternative, including switching costs]. If that stopped being true, then I would no longer endorse keeping the current government. (And yes, it could become false due to a great alternative being newly-available, even if the current government didn’t get any worse in absolute terms. e.g. someone could wait until democracy is invented before they endorse replacing their monarchy.)
I’m not sure that “no one should have the power to enforce their own values” is even a coherent concept. Pick a possible future—say, disassembling the earth to build a Dyson sphere—and suppose that at least one person wants it to happen, and at least one person wants it not to happen. When the future actually arrives, it will either have happened, or not—which means at least one person “won” and at least one person “lost”. What exactly does it mean for “neither of those people had the power to enforce their value”, given that one of the values did, in fact, win? Don’t we have to say that one of them clearly had enough power to stymie the other?
You could say that society should have a bunch of people in it, and that no single person should be able to overpower everyone else combined. But that doesn’t prevent some value from being able to overpower all other values, because a value can be endorsed by multiple people!
I suppose someone could hypothetically say that they really only care about the process of government and not the result, such that they’ll accept any result as long as it is blessed by the proper process. Even if you’re willing to go to that extreme, though, that still seems like a case of wanting “your values” to have power, just where the thing you value is a particular system of government. I don’t think that having this particular value gives you any special moral high ground over people who value, say, life and happiness.
I also think that approximately no one actually has that as a terminal value.
.
I think you’re still thinking in terms of something like formalized political power, whereas other people are thinking in terms of “any ability to affect the world”.
Suppose a fantastically powerful alien called Superman comes to earth, and starts running around the city of Metropolis, rescuing people and arresting criminals. He has absurd amounts of speed, strength, and durability. You might think of Superman as just being a helpful guy who doesn’t rule anything, but as a matter of capability he could demand almost anything from the rest of the world and the rest of the world couldn’t stop him. Superman is de facto ruler of Earth; he just has a light touch.
If you consider that acceptable, then you aren’t objecting to “god-like status and control”, you just have opinions about how that control should be exercised.
If you consider that UNacceptable, then you aren’t asking for Superman to behave in certain ways, you are asking for Superman to not exist (or for some other force to exist that can check him).
Most humans (probably including you) are currently a “prisoner” of a coalition of humans who will use armed force to subdue and punish you if you take any actions that the coalition (in its sole discretion) deems worthy of such punishment. Many of these coalitions (though not all of them) are called “governments”. Most humans seem to consider the existence of such coalitions to be a good thing on balance (though many would like to get rid of certain particular coalitions).
I will grant that most commenters on LessWrong probably want Superman to take a substantially more interventionist approach than he does in DC Comics (because frankly his talents are wasted stopping petty crime in one city).
Most commenters here still seem to want Superman to avoid actions that most humans would disapprove of, though.
I’m definitely fine with not having Superman, but I’m willing to settle on him not intervening.
On a different note, I’d disagree that Superman, just by existing and being powerful, is a de facto ruler in any sense—he of course could be, but that would entail a tradeoff that he may not like (living an unburdened life).
A core part of Paul’s arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it’s more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed