I’m gonna take a probably-contrarian position on my own question:
While I think technical questions such as natural abstractions are important, it seems to me that the most central question is, what do we even want to align it to? What are “human values”?
I think I have a plausible answer (famous last words) for a lot of it, but there is a paradox/contradiction that I keep getting stuck on: Malthusianism.
As in, we’ll probably want a future where a lot of people (in a broad sense potentially including Ems etc.) get to live independently. But if we do, then there are three things it seems we cannot have all at once:
Economic freedom: people may freely trade and gain ownership over things.
Reproductive freedom: people may freely create new people, at least up to a point (such as those similar to themselves).
Slack: people can live without optimizing relentlessly for economic productivity and efficiency.
The reason being that if you have 1+2, then some highly economically efficient agents are gonna copy themselves until they outcompete everyone else, preventing 3.
The “default trajectory” seems to be 1+2. It raises the multipolarity vs unipolarity debate, which in my view basically boils down to whether we lose slack and people get starved, or we lose “a lot of people get to live independently” and get paperclipped.
Some theorists point out that in multipolar scenarios, maybe AI respects property rights enough that we get a Slack’ outcome: people who were sufficiently wealthy before AGI and made the right investment decisions (e.g. putting money in chips) can live without optimizing relentlessly for economic productivity and efficiency. These theorists often seem worried that people will decide to shut down AI progress, preventing them from achieving 1+2+3′.
What do you hope to get? 1+2? 1+3? 2+3? 1+2+3′? Something else?
Probably I should have read more sci-fi since it seems like the sort of question sci-fi might explore.
Malthusianism is mainly a problem when new people can take resources that are not their parents’, which is a form of disrespect for property rights (mandatory redistribution to new people from those who didn’t consent to their creation). If it’s solely parents that are responsible for the wealth of their children, then it won’t affect others, except morally in the internal mindcrime scenarios where some would generate great suffering within their domain. (This is in context of the initial condition where every person owns enough for a slack-enabling mode of survival in perpetuity, only growth ever calls for more.)
This is in context of the initial condition where every person owns enough for a slack-enabling mode of survival in perpetuity, only growth ever calls for more.
I don’t think that holds in reality. Most people today seem dependent on continually cooperating to obtain additional resources, which they mostly do in free market competition with others.
Universal FIRE wealth doesn’t hold today. The Milky Way comprises more than 100 billion stars, the Hubble Volume much more. Even an astronomically tiny fraction of cosmic endowment in initial equity would be sufficient to run a single upload for as long as the currently prevalent mode of physical laws still applies. So if humanity isn’t wiped out outright, this initial condition seems plausible to obtain at some point within a few physical years of the first AGI.
Just 3 with a dash of 1?
I don’t understand the specific appeal of complete reproductive freedom. It is desirable to have that freedom, in the same way it is desirable to be allowed to do whatever I feel like doing. However, that more general heading of arbitrary freedom has the answer of ‘you do have to draw lines somewhere’. In a good future, I’m not allowed to harm a person (nonconsensually), and I can’t requisition all matter in the available universe for my personal projects without ~enough of the population endorsing it, and I can’t reproduce / construct arbitrary amounts and arbitrary new people. (Constructing arbitrary people obviously has moral issues too, so it has cutoff lines at both the ‘moral issues’ and ‘resource limitations even at the scale’)
I think economic freedom looks significantly different in a post aligned AGI world than it does now. Like, there is still some concepts of trade going on, but I expect often running in the background.
I’m not sure why you think the ‘default trajectory’ is 1+2. Aligned AGI seems to most likely go for some mix of 1+3, while pointing at the more wider/specific cause area of ‘what humans want’. A paperclipper just says null to all of those, because it isn’t giving humans the right to create new people or any economic freedom unless they manage to be in a position to actually-trade and have something worth offering.
I don’t think that what we want to align it to is that pertinent a question at this stage? In the specifics, that is, obviously human values in some manner.
I expect that we want to align it via some process that lets it figure our values out without needing to decide on much of it now, ala CEV.
Having a good theory of human values beforehand is useful for starting down a good track and verifying it, of course.
I think the generalized problem of ‘figure out how to make a process that is corrigible and learns our values in some form that is robust’ is easier than figuring out a decent specification of our values.
(Though simpler bounded-task agents seem likely before we manage that, so my answer to the overall question is ‘how do we make approximately corrigible
powerful bounded-task agents to get to a position where humanity can safely focus on producing aligned AGI’)
We can just Do Something Else Which Is Not Malthusian Trap? Like, have an agreement of not having more than two kids per hundred year per each parent and colonize stars accordingly. I think it will be simple especially after uplifting of major part of humanity.
In relatively hardcore scenarios, we can just migrate into simulations with computation management from benevolent AIs.
We can just Do Something Else Which Is Not Malthusian Trap? Like, have an agreement of not having more than two kids per hundred year per each parent and colonize stars accordingly. I think it will be simple especially after uplifting of major part of humanity.
This agreement falls under interfering with 2.
In relatively hardcore scenarios, we can just migrate into simulations with computation management from benevolent AIs.
That doesn’t solve the problem unless one takes a stance on 2.
The problem is not so much which one of 1,2,3 to pick but whether ‘we’ get a chance to pick it at all. If there is space, free energy, and diversity, there will be evolution going on among populations and evolution will consistently push things in the direction towards more reproduction up until it hits a Malthusian limit at which point it will push towards greater competition and economic/reproductive efficiency. The only way to avoid this is to remove the preconditions for evolution—any of variation, selection, heredity—but these seem quite natural in a world of large AI populations so in practice this will require some level of centralized control
Variation corresponds to “a lot of people (in a broad sense potentially including Ems etc.) get to live independently”, selection corresponds to economic freedom, and heredity correspond to reproductive freedom. (Not exactly ofc, but it’s hard to write something which exactly matches any given frame.)
Or rather, it’s both a question of how to pick it and what to pick. Like the MIRI plan is to grab control over the world and then use this to implement some sort of cosmopolitan value system. But if one does so, there’s still the question of which cosmopolitan value system to implement.
I think getting to “good enough” on this question should pretty much come for free when the hard problems are solved. For example any common sense statement like “Maximize flourishing as depicted in the UN convention on human rights” is IMO likely to get us to a good place, if the agent is honest, remains aligned to those values, and interprets them reasonably intelligently. (With each of those three pre-requisites being way harder than picking a non-harmful value function.)
If our AGIs, after delivering utopia, tell us we need to start restricting childbearing rights I don’t see that as problematic. Long before we require that step we will have revolutionized society and so most people will buy into the requirement.
Honestly I think there are plenty of great outcomes that don’t preserve 1 as well. A world of radical abundance with no ownership, property, or ability to form companies/enterprises could still be dramatically better than the no-AGI counterfactual trajectory, even if it happens not to be most people’s preferred outcome ex ante.
For sci-fi, I’d say Ian M. Banks’ Culture series presents one of the more plausible (as in plausibly stable, not most probable ex ante) AGI-led utopias. (It’s what Musk is referring to when he says AGIs will keep us around because we are interesting.)
I’m gonna take a probably-contrarian position on my own question:
While I think technical questions such as natural abstractions are important, it seems to me that the most central question is, what do we even want to align it to? What are “human values”?
I think I have a plausible answer (famous last words) for a lot of it, but there is a paradox/contradiction that I keep getting stuck on: Malthusianism.
As in, we’ll probably want a future where a lot of people (in a broad sense potentially including Ems etc.) get to live independently. But if we do, then there are three things it seems we cannot have all at once:
Economic freedom: people may freely trade and gain ownership over things.
Reproductive freedom: people may freely create new people, at least up to a point (such as those similar to themselves).
Slack: people can live without optimizing relentlessly for economic productivity and efficiency.
The reason being that if you have 1+2, then some highly economically efficient agents are gonna copy themselves until they outcompete everyone else, preventing 3.
The “default trajectory” seems to be 1+2. It raises the multipolarity vs unipolarity debate, which in my view basically boils down to whether we lose slack and people get starved, or we lose “a lot of people get to live independently” and get paperclipped.
Some theorists point out that in multipolar scenarios, maybe AI respects property rights enough that we get a Slack’ outcome: people who were sufficiently wealthy before AGI and made the right investment decisions (e.g. putting money in chips) can live without optimizing relentlessly for economic productivity and efficiency. These theorists often seem worried that people will decide to shut down AI progress, preventing them from achieving 1+2+3′.
What do you hope to get? 1+2? 1+3? 2+3? 1+2+3′? Something else?
Probably I should have read more sci-fi since it seems like the sort of question sci-fi might explore.
Malthusianism is mainly a problem when new people can take resources that are not their parents’, which is a form of disrespect for property rights (mandatory redistribution to new people from those who didn’t consent to their creation). If it’s solely parents that are responsible for the wealth of their children, then it won’t affect others, except morally in the internal mindcrime scenarios where some would generate great suffering within their domain. (This is in context of the initial condition where every person owns enough for a slack-enabling mode of survival in perpetuity, only growth ever calls for more.)
I don’t think that holds in reality. Most people today seem dependent on continually cooperating to obtain additional resources, which they mostly do in free market competition with others.
Universal FIRE wealth doesn’t hold today. The Milky Way comprises more than 100 billion stars, the Hubble Volume much more. Even an astronomically tiny fraction of cosmic endowment in initial equity would be sufficient to run a single upload for as long as the currently prevalent mode of physical laws still applies. So if humanity isn’t wiped out outright, this initial condition seems plausible to obtain at some point within a few physical years of the first AGI.
“Assign ownership over fractions of the cosmic endowment to people who live today” might be a reasonable compromise between 3 and 3′.
Just 3 with a dash of 1?
I don’t understand the specific appeal of complete reproductive freedom. It is desirable to have that freedom, in the same way it is desirable to be allowed to do whatever I feel like doing. However, that more general heading of arbitrary freedom has the answer of ‘you do have to draw lines somewhere’. In a good future, I’m not allowed to harm a person (nonconsensually), and I can’t requisition all matter in the available universe for my personal projects without ~enough of the population endorsing it, and I can’t reproduce / construct arbitrary amounts and arbitrary new people. (Constructing arbitrary people obviously has moral issues too, so it has cutoff lines at both the ‘moral issues’ and ‘resource limitations even at the scale’)
I think economic freedom looks significantly different in a post aligned AGI world than it does now. Like, there is still some concepts of trade going on, but I expect often running in the background.
I’m not sure why you think the ‘default trajectory’ is 1+2. Aligned AGI seems to most likely go for some mix of 1+3, while pointing at the more wider/specific cause area of ‘what humans want’. A paperclipper just says null to all of those, because it isn’t giving humans the right to create new people or any economic freedom unless they manage to be in a position to actually-trade and have something worth offering.
I don’t think that what we want to align it to is that pertinent a question at this stage? In the specifics, that is, obviously human values in some manner.
I expect that we want to align it via some process that lets it figure our values out without needing to decide on much of it now, ala CEV.
Having a good theory of human values beforehand is useful for starting down a good track and verifying it, of course.
I think the generalized problem of ‘figure out how to make a process that is corrigible and learns our values in some form that is robust’ is easier than figuring out a decent specification of our values.
(Though simpler bounded-task agents seem likely before we manage that, so my answer to the overall question is ‘how do we make approximately corrigible powerful bounded-task agents to get to a position where humanity can safely focus on producing aligned AGI’)
We can just Do Something Else Which Is Not Malthusian Trap? Like, have an agreement of not having more than two kids per hundred year per each parent and colonize stars accordingly. I think it will be simple especially after uplifting of major part of humanity.
In relatively hardcore scenarios, we can just migrate into simulations with computation management from benevolent AIs.
This agreement falls under interfering with 2.
That doesn’t solve the problem unless one takes a stance on 2.
It’s not interfering with someone’s freedoms if it is voluntary cooperation?
Ok, but then you haven’t solved the problem for the subset of people who decide they don’t want to cooperate.
Well, it’s a decision theory problem. I recommend “Unifying bargain” by Diffractor.
Maybe I will look at that again at some point in a while.
The problem is not so much which one of 1,2,3 to pick but whether ‘we’ get a chance to pick it at all. If there is space, free energy, and diversity, there will be evolution going on among populations and evolution will consistently push things in the direction towards more reproduction up until it hits a Malthusian limit at which point it will push towards greater competition and economic/reproductive efficiency. The only way to avoid this is to remove the preconditions for evolution—any of variation, selection, heredity—but these seem quite natural in a world of large AI populations so in practice this will require some level of centralized control
Yes.
Variation corresponds to “a lot of people (in a broad sense potentially including Ems etc.) get to live independently”, selection corresponds to economic freedom, and heredity correspond to reproductive freedom. (Not exactly ofc, but it’s hard to write something which exactly matches any given frame.)
Or rather, it’s both a question of how to pick it and what to pick. Like the MIRI plan is to grab control over the world and then use this to implement some sort of cosmopolitan value system. But if one does so, there’s still the question of which cosmopolitan value system to implement.
I think getting to “good enough” on this question should pretty much come for free when the hard problems are solved. For example any common sense statement like “Maximize flourishing as depicted in the UN convention on human rights” is IMO likely to get us to a good place, if the agent is honest, remains aligned to those values, and interprets them reasonably intelligently. (With each of those three pre-requisites being way harder than picking a non-harmful value function.)
If our AGIs, after delivering utopia, tell us we need to start restricting childbearing rights I don’t see that as problematic. Long before we require that step we will have revolutionized society and so most people will buy into the requirement.
Honestly I think there are plenty of great outcomes that don’t preserve 1 as well. A world of radical abundance with no ownership, property, or ability to form companies/enterprises could still be dramatically better than the no-AGI counterfactual trajectory, even if it happens not to be most people’s preferred outcome ex ante.
For sci-fi, I’d say Ian M. Banks’ Culture series presents one of the more plausible (as in plausibly stable, not most probable ex ante) AGI-led utopias. (It’s what Musk is referring to when he says AGIs will keep us around because we are interesting.)