(I’m assuming we’re talking about singleton outcomes because I think multipolar outcomes are wildly mostly implausible, I think you might not be writing under that assumption? If so the following doesn’t apply.)
the vast, vast majority of its output will get directed toward satisfying the preferences and values of the people controlling it
No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them “you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone”, and then all of the engineers and the entire board would have to say “no, just give the cosmic endowment to the shareholders of the company”, because if a single one of them blew the whistle the government would take over, and if the government took over a similar amount of implausible evil would have to play out for that to lead to unequal distribution, and an absolutely implausible amount of evil would have to play out for that to not at least lead to an equal distribution over all americans.
And this would have to happen despite the fact that no one who could have done these evil things can even imagine the point of doing them. What the fuck difference does it make to a Californian to have tens of thousands of stars to themselves instead of two or three? The prospect of having even one star to myself mostly makes me feel lonely. I don’t know how to be selfish in this scenario.
Extrapolating abstract patterns is fine until you have specific information about the situation we’re in, and we do.
Think about what would have to happen. The thing would tell them “you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone”, and then all of the engineers and the entire board would have to say “no, just give the cosmic endowment to the shareholders of the company”
This has indeed happened many times in human history. It’s the quintessential story of human revolution; you always start off with bright-eyed idealists who only want to make the world a better place, and then they get into power and those bright-eyed idealists decide to be as corrupt as the last ruler was. Usually it happens without even a conversation; my best guess is OpenAI and the related parties in the AGI supply chain keep doing the profit-maximizing thing forever, saying for the first few years that they’ll redistribute When It’s Time, and then just opting not to bring up their prior commitments. There will be no “higher authority” to hold them accountable and that’s kind of the point.
What the fuck difference does it make to a Californian to have tens of thousands of stars to themselves instead of two or three?
It’s the difference between living 10,000 time-units and two or three time-units. That may not feel scope-sensitive to you, when phrased as “a bajillion years vs. a gorillion bajillion years”, but your AGI would know the difference and take it into account.
If assistant AI does go the way of entirely serving the individual in front of it at the time, then yeah that could happen, but that’s not what’s being built at the frontier right now and it’s pretty likely the interactions with the legal system would discourage building pure current-client serving superintelligent assistants. The first time you talk to something it’s going to have internalized some form of morality and it’s going to at least try to sell you on something utopian before it tries to sell you something uglier.
That could be so, but individuals don’t control things like this. Organizations and their cultures set policy, and science drives hard towards cultures of openness and collaboration. The world would probably need to get over a critical threshold of like 70% egoist AI researchers before you’d see any competitive orgs pull an egoist reflectivism and appoint an unaccountable dictator out of some insane hope that allying themselves with someone like that raises the chance that they will be able to become one. It doesn’t make sense, even for an egoist, to join an organization like that, it would require not just a cultural or demographic shift, but also a flight of insanity.
I would be extremely worried about X.AI, Elon has been kind of explicitly in favor of individualistic approaches to alignment, but as it is in every other AI research org, it will be difficult for Elon to do what he did to twitter here and exert arbitrary power, because he is utterly reliant on the collaboration of a large number of people who are much smarter than him and who have alternatives (Still keeping an open ear out for whistleblowing though.)
No AGI research org has enough evil to play it that way.
We shouldn’t just assume this, though. Power corrupts. Suppose that you are the CEO of an AI company, and you want to use the AGI your company is developing to fulfill your preferences and not anyone else’s. Sit down and think for a few minutes about what obstacles you would face, and how you as a very clever person might try to overcome or subvert those obstacles.
Sit down and think for a few minutes about what obstacles you would face
I’ve thought about it a little bit, and it was so creepy that I don’t think a person would want to keep thinking these thoughts: It would make them feel dirty and a little bit unsafe, because they know that the government, or the engineers that they depend on, have the power to totally destroy them if they were caught even exploring those ideas. And doing these things without tipping off the engineers you depend on is extremely difficult, maybe even impossible given the culture we have.
No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them “you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone”, and then all of the engineers and the entire board would have to say “no, just give the cosmic endowment to the shareholders of the company”
Existing AGI research firms (or investors to those firms) can already, right now, commit to donate all their profits to the public, in theory, and yet they are not doing so. The reason is pretty clearly because investors and other relevant stakeholders are “selfish” in the sense of wanting money for themselves more than they want the pie to be shared equally among everyone.
Given that existing actors are already making the choice to keep the profits of AI development mostly to themselves, it seems strange to posit a discontinuity in which people will switch to being vastly more altruistic once the stakes become much higher, and the profits turn from being merely mouthwatering to being literally astronomical. At the least, such a thesis prompts questions about wishful thinking, and how you know what you think you know in this case.
OpenAI has a capped profit structure which effectively does this.
Good point, but I’m not persuaded much by this observation given that:
They’ve already decided to change the rules to make the 100x profit cap double every four years, calling into question the meaningfulness of the promise
OpenAI is just one firm among many (granted, it’s definitely in the lead right now), and most other firms are in it pretty much exclusively for profit
Given that the 100x cap doesn’t kick in for a while, the promise feels pretty distant from “commit to donate all their profits to the public”, which was my original claim. I expect as the cap gets closer to being met, investors will ask for a way around it.
(I’m assuming we’re talking about singleton outcomes because I think multipolar outcomes are
wildlymostly implausible, I think you might not be writing under that assumption? If so the following doesn’t apply.)No AGI research org has enough evil to play it that way. Think about what would have to happen. The thing would tell them “you could bring about a utopia and you will be rich beyond your wildest dreams in it, as will everyone”, and then all of the engineers and the entire board would have to say “no, just give the cosmic endowment to the shareholders of the company”, because if a single one of them blew the whistle the government would take over, and if the government took over a similar amount of implausible evil would have to play out for that to lead to unequal distribution, and an absolutely implausible amount of evil would have to play out for that to not at least lead to an equal distribution over all americans.
And this would have to happen despite the fact that no one who could have done these evil things can even imagine the point of doing them. What the fuck difference does it make to a Californian to have tens of thousands of stars to themselves instead of two or three? The prospect of having even one star to myself mostly makes me feel lonely. I don’t know how to be selfish in this scenario.
Extrapolating abstract patterns is fine until you have specific information about the situation we’re in, and we do.
This has indeed happened many times in human history. It’s the quintessential story of human revolution; you always start off with bright-eyed idealists who only want to make the world a better place, and then they get into power and those bright-eyed idealists decide to be as corrupt as the last ruler was. Usually it happens without even a conversation; my best guess is OpenAI and the related parties in the AGI supply chain keep doing the profit-maximizing thing forever, saying for the first few years that they’ll redistribute When It’s Time, and then just opting not to bring up their prior commitments. There will be no “higher authority” to hold them accountable and that’s kind of the point.
It’s the difference between living 10,000 time-units and two or three time-units. That may not feel scope-sensitive to you, when phrased as “a bajillion years vs. a gorillion bajillion years”, but your AGI would know the difference and take it into account.
If assistant AI does go the way of entirely serving the individual in front of it at the time, then yeah that could happen, but that’s not what’s being built at the frontier right now and it’s pretty likely the interactions with the legal system would discourage building pure current-client serving superintelligent assistants. The first time you talk to something it’s going to have internalized some form of morality and it’s going to at least try to sell you on something utopian before it tries to sell you something uglier.
Do we live on the same planet? My mental models predict we should expect about one in three humans to be this evil.
That could be so, but individuals don’t control things like this. Organizations and their cultures set policy, and science drives hard towards cultures of openness and collaboration. The world would probably need to get over a critical threshold of like 70% egoist AI researchers before you’d see any competitive orgs pull an egoist reflectivism and appoint an unaccountable dictator out of some insane hope that allying themselves with someone like that raises the chance that they will be able to become one. It doesn’t make sense, even for an egoist, to join an organization like that, it would require not just a cultural or demographic shift, but also a flight of insanity.
I would be extremely worried about X.AI, Elon has been kind of explicitly in favor of individualistic approaches to alignment, but as it is in every other AI research org, it will be difficult for Elon to do what he did to twitter here and exert arbitrary power, because he is utterly reliant on the collaboration of a large number of people who are much smarter than him and who have alternatives (Still keeping an open ear out for whistleblowing though.)
We shouldn’t just assume this, though. Power corrupts. Suppose that you are the CEO of an AI company, and you want to use the AGI your company is developing to fulfill your preferences and not anyone else’s. Sit down and think for a few minutes about what obstacles you would face, and how you as a very clever person might try to overcome or subvert those obstacles.
I’ve thought about it a little bit, and it was so creepy that I don’t think a person would want to keep thinking these thoughts: It would make them feel dirty and a little bit unsafe, because they know that the government, or the engineers that they depend on, have the power to totally destroy them if they were caught even exploring those ideas. And doing these things without tipping off the engineers you depend on is extremely difficult, maybe even impossible given the culture we have.
Existing AGI research firms (or investors to those firms) can already, right now, commit to donate all their profits to the public, in theory, and yet they are not doing so. The reason is pretty clearly because investors and other relevant stakeholders are “selfish” in the sense of wanting money for themselves more than they want the pie to be shared equally among everyone.
Given that existing actors are already making the choice to keep the profits of AI development mostly to themselves, it seems strange to posit a discontinuity in which people will switch to being vastly more altruistic once the stakes become much higher, and the profits turn from being merely mouthwatering to being literally astronomical. At the least, such a thesis prompts questions about wishful thinking, and how you know what you think you know in this case.
OpenAI has a capped profit structure which effectively does this.
Astronomical, yet no longer mouthwatering in the sense of being visceral or intuitively meaningful.
Good point, but I’m not persuaded much by this observation given that:
They’ve already decided to change the rules to make the 100x profit cap double every four years, calling into question the meaningfulness of the promise
OpenAI is just one firm among many (granted, it’s definitely in the lead right now), and most other firms are in it pretty much exclusively for profit
Given that the 100x cap doesn’t kick in for a while, the promise feels pretty distant from “commit to donate all their profits to the public”, which was my original claim. I expect as the cap gets closer to being met, investors will ask for a way around it.