My problem with that is I think solving “human values” is extremely unlikely for us to do in the way you seem to be describing it, since most people don’t even want to. At best, they just want to be left alone and make sure them and their families and friends aren’t the ones hit hardest. And if we don’t solve this problem, but manage alignment anyways, the results are unimaginably worse than what Clippy would produce.
Suppose some power that is Somewhat Evil succeeds in alignment and makes it work.
They take over the solar system.
They configure the AGI policy JSON files to Eternal Paradise (well billions of years worth) for their in group. Eternal mediocrity for the out group who has some connections to their in group. And yea Eternal Suffering for the out group.
Arguably so long as the (in group + mediocre group) is greater than the outgroup this is a better situation than clippy. It is positive balance of human experience vs 0.
Moreover there is a chance of future reforms, where the in group in power decides the outgroup have suffered enough and adds them to the mediocre group or kills them.
Since the life support pods to keep the suffering group alive cost finite resources that could make conditions better for the other 2 groups there is an incentive to release or kill them.
Clippy case this can’t happen.
You seem to be worried a world power who is Mostly Evil who thinks everyone is out-group but some arbitrarily small number of people (north Koreans, Russians etc) will gain AGI first.
This is stupendously unlikely. AGI takes immense resources to develop- cutting edge compute and large amounts of it as well as many educated humans- and societies that have broader middle classes are orders of magnitude wealthier.
This essentially isn’t a plausible risk.
Arguably the reason Russia has any money at all—or Afghanistan—is from sales from natural resources to wealthier societies with broad in groups.
First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it’s worth pointing that out.
But I don’t think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I’d be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it’s totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that.
This is meant just to illustrate that you don’t need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough.
This family of scenarios is probably still better than AGI doom though.
I kept thinking how a theocracy, assuming it did adopt all the advanced technology we are almost certain is possible but lack the ability to implement, could deal with all these challenges to it’s beliefs.
Half the population is being mistreated because they were born the wrong gender/the wrong ethnic subgroup? No problem, they’ll just go gender transition to the favored group. Total body replacement would be possible so there would be no way to tell who did this.
Sure the theocracy could ban the treatments but it conquered the solar system—it had to adopt a lot of the ideas or it wouldn’t have succeeded.
There are apparently many fractured subgroups who all nominally practice the same religion as well. But the only way to tell membership has to do with subtle cultural signals/body appearance. With neural implants people could mimic the preferred subgroup also...
I think it would keep hitting challenges. It reminds me of culturally the effect the pandemic had. The US culture based on in-person work, and everyone working or they are to be allowed to starve and become homeless, was suddenly at odds with reality.
Or just rules lawyering. Supposedly brothels in some Islamic countries have a priest on hand to temporarily marry all the couples. Alcohol may be outlawed but other pharmaceuticals mimic a similar effect. You could anticipate the end result is a modern liberal civilization that rules lawyers it’s way out of needing to adhere to the obsolete religious principles. (homosexuality? still banned but VR doesn’t count. burkhas? still required but it’s legal to project a simulated image of yourself on top...)
If they have a superhuman AGI, they can use it to predict all possible ways people might try to find workarounds for their commandments (e.g., gender transition, neural implants, etc.) and make them impossible.
If I understand you correctly, you are pointing to the fact that values/shards change in response to novel situations. Sure and perhaps even solar system-wide 1984-style regime would over time slowly morph into (something closer to) luxurious fully-automated gay space communism. But that’s a big maybe IMO. If we had good evidence that prosperity loosens and liberalizes societies across historical and cultures contexts plus perhaps solid models of axiological evolution (I don’t know if something like that even exists), my credence in that would be higher. Also, why not use AGI to fix your regime’s fundamental values or at least make them a stable attractor over thousands or millions of years.
(I feel like right now we are mostly doing adversarial worldbuilding)
An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don’t need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.
Sure, if North Korea, Nazi Germany, or even CCP/Putin were the first ones to build an AGI and succesfully align it with their values, then we would be in a huge trouble, but that’s a matter of AI governance, not the object-level problem of making the AI consistenly will to do the thing the human would like it to do if they were smarter or whatever.
If we solve alignment without “solving human values” and most people will just stick to their common sense/intuitive ethics[1] and the people/orgs doing the aligning are “the ~good guys” without any retributivist/racist/outgroupish impulses… perhaps they would like to secure for themselves some huge monetary gain, but other than that are completely fine with enacting the Windfall Clause, letting benefits to trickle down to every-sentient-body and implementing luxurious fully automated gay space communism or whatever their coherently extrapolated vision of protopia is...
Yeah, for everything I just listed you probably could find some people that wouldn’t like it (gay space communism, protopian future, financial gain for the first aligners, benefits trickling down to everybody) and even argue against it somewhat consistently but unless they are antinatalist or some religious fundamentalist nutjob I don’t think would say that it’s worse than AI doom.
Although I think you exaggerate their parochialism and underestimate how much folk ethics has changed over the last hundreds of years and perhaps how much it can change in the future if historical dynamics will be favorable.
Something I would Really really like anti-AI communities to consider is that regulations/activism/etc aimed to harm AI development and slow AI timelines do not have equal effects on all parties. Specifically, I argue that the time until the CCP develops CCP aligned AI is almost invariant, whilst the time until Blender reaches sentience potentially varies greatly.
I am Much much more hope for likeable AI via open source software rooted in a desire to help people and make their lives better, than (worst case scenario) malicious government actors, or (second) corporate advertisers.
I want to minimize first the risk of building Zon-Kuthon. Then, Asmodeus. Once you’re certain you’ve solved A and B, you can worry about not building Rovagug. I am extremely perturbed about the AI alignment community whenever I see any sort of talk of preventing the world being destroyed where this moves any significant probability mass from Rovagug to Asmodeus. A sensible AI alignment community would not bother discussing Rovagug yet, and would especially not imply that the end of the world is the worst case scenario.
My problem with that is I think solving “human values” is extremely unlikely for us to do in the way you seem to be describing it, since most people don’t even want to. At best, they just want to be left alone and make sure them and their families and friends aren’t the ones hit hardest. And if we don’t solve this problem, but manage alignment anyways, the results are unimaginably worse than what Clippy would produce.
I have to question this.
Let’s play this out.
Suppose some power that is Somewhat Evil succeeds in alignment and makes it work.
They take over the solar system.
They configure the AGI policy JSON files to Eternal Paradise (well billions of years worth) for their in group. Eternal mediocrity for the out group who has some connections to their in group. And yea Eternal Suffering for the out group.
Arguably so long as the (in group + mediocre group) is greater than the outgroup this is a better situation than clippy. It is positive balance of human experience vs 0.
Moreover there is a chance of future reforms, where the in group in power decides the outgroup have suffered enough and adds them to the mediocre group or kills them.
Since the life support pods to keep the suffering group alive cost finite resources that could make conditions better for the other 2 groups there is an incentive to release or kill them.
Clippy case this can’t happen.
You seem to be worried a world power who is Mostly Evil who thinks everyone is out-group but some arbitrarily small number of people (north Koreans, Russians etc) will gain AGI first.
This is stupendously unlikely. AGI takes immense resources to develop- cutting edge compute and large amounts of it as well as many educated humans- and societies that have broader middle classes are orders of magnitude wealthier.
This essentially isn’t a plausible risk.
Arguably the reason Russia has any money at all—or Afghanistan—is from sales from natural resources to wealthier societies with broad in groups.
First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it’s worth pointing that out.
But I don’t think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I’d be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it’s totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that.
This is meant just to illustrate that you don’t need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough.
This family of scenarios is probably still better than AGI doom though.
Thanks for the post.
I kept thinking how a theocracy, assuming it did adopt all the advanced technology we are almost certain is possible but lack the ability to implement, could deal with all these challenges to it’s beliefs.
Half the population is being mistreated because they were born the wrong gender/the wrong ethnic subgroup? No problem, they’ll just go gender transition to the favored group. Total body replacement would be possible so there would be no way to tell who did this.
Sure the theocracy could ban the treatments but it conquered the solar system—it had to adopt a lot of the ideas or it wouldn’t have succeeded.
There are apparently many fractured subgroups who all nominally practice the same religion as well. But the only way to tell membership has to do with subtle cultural signals/body appearance. With neural implants people could mimic the preferred subgroup also...
I think it would keep hitting challenges. It reminds me of culturally the effect the pandemic had. The US culture based on in-person work, and everyone working or they are to be allowed to starve and become homeless, was suddenly at odds with reality.
Or just rules lawyering. Supposedly brothels in some Islamic countries have a priest on hand to temporarily marry all the couples. Alcohol may be outlawed but other pharmaceuticals mimic a similar effect. You could anticipate the end result is a modern liberal civilization that rules lawyers it’s way out of needing to adhere to the obsolete religious principles. (homosexuality? still banned but VR doesn’t count. burkhas? still required but it’s legal to project a simulated image of yourself on top...)
If they have a superhuman AGI, they can use it to predict all possible ways people might try to find workarounds for their commandments (e.g., gender transition, neural implants, etc.) and make them impossible.
If I understand you correctly, you are pointing to the fact that values/shards change in response to novel situations. Sure and perhaps even solar system-wide 1984-style regime would over time slowly morph into (something closer to) luxurious fully-automated gay space communism. But that’s a big maybe IMO. If we had good evidence that prosperity loosens and liberalizes societies across historical and cultures contexts plus perhaps solid models of axiological evolution (I don’t know if something like that even exists), my credence in that would be higher. Also, why not use AGI to fix your regime’s fundamental values or at least make them a stable attractor over thousands or millions of years.
(I feel like right now we are mostly doing adversarial worldbuilding)
An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don’t need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.
Sure, if North Korea, Nazi Germany, or even CCP/Putin were the first ones to build an AGI and succesfully align it with their values, then we would be in a huge trouble, but that’s a matter of AI governance, not the object-level problem of making the AI consistenly will to do the thing the human would like it to do if they were smarter or whatever.
If we solve alignment without “solving human values” and most people will just stick to their common sense/intuitive ethics[1] and the people/orgs doing the aligning are “the ~good guys” without any retributivist/racist/outgroupish impulses… perhaps they would like to secure for themselves some huge monetary gain, but other than that are completely fine with enacting the Windfall Clause, letting benefits to trickle down to every-sentient-body and implementing luxurious fully automated gay space communism or whatever their coherently extrapolated vision of protopia is...
Yeah, for everything I just listed you probably could find some people that wouldn’t like it (gay space communism, protopian future, financial gain for the first aligners, benefits trickling down to everybody) and even argue against it somewhat consistently but unless they are antinatalist or some religious fundamentalist nutjob I don’t think would say that it’s worse than AI doom.
Although I think you exaggerate their parochialism and underestimate how much folk ethics has changed over the last hundreds of years and perhaps how much it can change in the future if historical dynamics will be favorable.
Something I would Really really like anti-AI communities to consider is that regulations/activism/etc aimed to harm AI development and slow AI timelines do not have equal effects on all parties. Specifically, I argue that the time until the CCP develops CCP aligned AI is almost invariant, whilst the time until Blender reaches sentience potentially varies greatly.
I am Much much more hope for likeable AI via open source software rooted in a desire to help people and make their lives better, than (worst case scenario) malicious government actors, or (second) corporate advertisers.
I want to minimize first the risk of building Zon-Kuthon. Then, Asmodeus. Once you’re certain you’ve solved A and B, you can worry about not building Rovagug. I am extremely perturbed about the AI alignment community whenever I see any sort of talk of preventing the world being destroyed where this moves any significant probability mass from Rovagug to Asmodeus. A sensible AI alignment community would not bother discussing Rovagug yet, and would especially not imply that the end of the world is the worst case scenario.
I don’t think AGI is on the CCP radar.
Antinatalists getting the AI is morally the same as paperclip doom, everyone dies.