Nah, you’re describing the default scenario, not one with alignment solved. Alignment solved means we have a utility function that reliably points away from hell, no matter who runs it—an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than “stand down”. anything less than that and we get the default scenario, which is a huge loss of humanity, some unknown period of s-risk, followed by an alien species of AI setting out for the stars with strange, semi-recognizeable values.
i don’t think this really makes sense. “alignment” means we can align it to the values of a person or group. if that person or group’s CEV wants there to be a hell where people they think of as bad suffer maximally, or if that CEV even just wants there to be a meat industry with real animals in it, then that’s exactly what the AI will implement. “alignment” is not some objectively good utility function within which variations in human values don’t matter that much, because there is no objective good.
an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states
i don’t think we get that, i think we get an AI that takes over the world very quickly no matter what. it’s just that, if it’s aligned to good values, we then get utopia rather than extinction or hell.
yeah that sounds like the MIRI perspective. I continue to believe there is a fundamental shared structure in all moral systems and that identifying it would allow universalized co-protection.
an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than “stand down”
Sure, maybe we find such an algorithm. What happens to those who have no bargaining power? The bargain is between all the powers that be, many of which don’t care about or actively seek the suffering of those without power. The deal will almost certainly involve a ton of suffering for animals, for example, and anyone else who doesn’t have enough social power to be considered by the algorithm.
That’s the thing, all of humanity is going to have no bargaining power and so universal friendly bargaining needs to offer bargaining power to those who don’t have the ability to demand it.
What is the incentive for the people who have influence over the development of AI to implement such a thing? Why not only include bargaining power for the value systems of said people with influence?
Maybe there’s a utility function that reliably points away from hell no matter who runs it, but there are plenty of people who actually want some specific variety of hell for those they dislike, so they won’t run that utility function.
now you are getting into the part where you are writing posts I would have written. we started out very close to agreeing anyway.
The reason is that failure to do this will destroy them too, bargaining that doesn’t support those who can’t demand it will destroy all of humanity, but that’s not obvious to most of them right now and it won’t be until it’s too late
What about bargaining which only supports those who can demand it in the interim before value lock-in, when humans still have influence? If people in power successfully lock-in their own values into the AGI, the fact they have no bargaining power after the AI takes over doesn’t matter, since it’s aligned to them. If that set of values screws over others who don’t have bargaining power even before the AI takeover, that won’t hurt them after the AI takes over.
yep, this is pretty much the thing I’ve been worried about, and it always has been. I’d say that that is the classic inter-agent safety failure that has been ongoing since AI was invented in 12th-century France. But I think people overestimate how much they can control their children, and the idea that the people in power are going to successfully lock in their values without also protecting extant humans and other beings with weak bargaining power is probably a (very hard to dispel) fantasy.
What do you mean that AI was invented in 12th century France?
And why do you think that locking in values to protect some humans and not others, or humans and not animals, or something like this, is less possible than locking in values to protect all sentient beings? What makes it a “fantasy”?
Let’s take a human being you consider a great person. His or her intelligence keeps greatly increasing. Do you think they would stay aligned with humans forever? If so, why? It’s important to remember their intelligence is increasing not just like a disorder in a movie where someone thinks they are way smarter than humans but is human level. Why would the universe revolve around humans?
Nah, you’re describing the default scenario, not one with alignment solved. Alignment solved means we have a utility function that reliably points away from hell, no matter who runs it—an algorithm for universal prosocial bargaining that can be verified by all of its users, including militaries and states, to the point that no one need give another order other than “stand down”. anything less than that and we get the default scenario, which is a huge loss of humanity, some unknown period of s-risk, followed by an alien species of AI setting out for the stars with strange, semi-recognizeable values.
i don’t think this really makes sense. “alignment” means we can align it to the values of a person or group. if that person or group’s CEV wants there to be a hell where people they think of as bad suffer maximally, or if that CEV even just wants there to be a meat industry with real animals in it, then that’s exactly what the AI will implement. “alignment” is not some objectively good utility function within which variations in human values don’t matter that much, because there is no objective good.
i don’t think we get that, i think we get an AI that takes over the world very quickly no matter what. it’s just that, if it’s aligned to good values, we then get utopia rather than extinction or hell.
yeah that sounds like the MIRI perspective. I continue to believe there is a fundamental shared structure in all moral systems and that identifying it would allow universalized co-protection.
Sure, maybe we find such an algorithm. What happens to those who have no bargaining power? The bargain is between all the powers that be, many of which don’t care about or actively seek the suffering of those without power. The deal will almost certainly involve a ton of suffering for animals, for example, and anyone else who doesn’t have enough social power to be considered by the algorithm.
That’s the thing, all of humanity is going to have no bargaining power and so universal friendly bargaining needs to offer bargaining power to those who don’t have the ability to demand it.
What is the incentive for the people who have influence over the development of AI to implement such a thing? Why not only include bargaining power for the value systems of said people with influence?
Maybe there’s a utility function that reliably points away from hell no matter who runs it, but there are plenty of people who actually want some specific variety of hell for those they dislike, so they won’t run that utility function.
now you are getting into the part where you are writing posts I would have written. we started out very close to agreeing anyway.
The reason is that failure to do this will destroy them too, bargaining that doesn’t support those who can’t demand it will destroy all of humanity, but that’s not obvious to most of them right now and it won’t be until it’s too late
What about bargaining which only supports those who can demand it in the interim before value lock-in, when humans still have influence? If people in power successfully lock-in their own values into the AGI, the fact they have no bargaining power after the AI takes over doesn’t matter, since it’s aligned to them. If that set of values screws over others who don’t have bargaining power even before the AI takeover, that won’t hurt them after the AI takes over.
yep, this is pretty much the thing I’ve been worried about, and it always has been. I’d say that that is the classic inter-agent safety failure that has been ongoing since AI was invented in 12th-century France. But I think people overestimate how much they can control their children, and the idea that the people in power are going to successfully lock in their values without also protecting extant humans and other beings with weak bargaining power is probably a (very hard to dispel) fantasy.
What do you mean that AI was invented in 12th century France?
And why do you think that locking in values to protect some humans and not others, or humans and not animals, or something like this, is less possible than locking in values to protect all sentient beings? What makes it a “fantasy”?
Let’s take a human being you consider a great person. His or her intelligence keeps greatly increasing. Do you think they would stay aligned with humans forever? If so, why? It’s important to remember their intelligence is increasing not just like a disorder in a movie where someone thinks they are way smarter than humans but is human level. Why would the universe revolve around humans?
definitely not guaranteed at all. we’re trying to solve coprotection to this level of durability for the first time