Thanks for your reply. I think we should use the term artificial conscience, not value alignment, for what you’re trying to do, for clarity. I’m happy to see we seem to agree that reversibility is important and replacing humans is an extremely bad outcome. (I’ve talked to people into value alignment of ASI who said they “would bite that bullet”, in other words would replace humanity by more efficient happy AI consciousness, so this point does not seem to be obvious. I’m also not convinced that leading longtermists necessarily think replacing humans is a bad outcome, and I think we should call them out on it.)
If one can implement artificial conscience in a reversible way, it might be an interesting approach. I think a minimum of what an aligned ASI would need to do is block other unaligned ASIs or ASI projects. If humanity supports this, I’d file it under a positive offense defense balance, which would be great. If humanity doesn’t support it, it would lead to conflict with humanity to do it anyway. I think an artificial conscience AI would either not want to fight that conflict (making it unable to stop unaligned ASI projects), or if it would, people would not see it as good anymore. I think societal awareness of xrisk and from there, support for regulation (either by AI or not) is what should make our future good, rather than aligning an ASI in a certain way.
Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.
I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans, then we made some sort of mistake. We most likely either got the artificial conscience wrong because that would’ve implicitly valued human life so wouldn’t have let a guard-railed AI wipe out humans, or we didn’t get an artificial conscience on board enough AI’s in time. An AI that had a “real” conscience also wouldn’t wipe out humans against the will of humans.
The way I currently envision the “typical” artificial conscience is that it would put a pretty strong conscience weight on not doing what its user wanted it to do, but this could be over-ruled by the conscience weight of not doing anything to prevent catastrophes. So the defensive, artificial conscience-guard-railed AI I’m thinking of would do the “last resort” things that were necessary to avoid s-risks, x-risks, and major catastrophes from coming to fruition, even if this wasn’t popular with most people, at least up to a point. If literally everyone in the world said, “Hey, we all want to die,” then the guard-railed AI, if it thought the people were in their “right mind,” would respect their wishes and let them die.
All that said, if we could somehow pause development of autonomous AI’s everywhere around the world until humans got their act together, developing their own consciences and senses of ethics, and were working as one team to cautiously take the next steps forward with AI, that would be great.
So if we ended up with some advanced AI’s replacing humans, then we made some sort of mistake
Again, I’m glad that we agree on this. I notice you want to do what I consider the right thing, and I appreciate that.
The way I currently envision the “typical” artificial conscience is that it would put a pretty strong conscience weight on not doing what its user wanted it to do, but this could be over-ruled by the conscience weight of not doing anything to prevent catastrophes. So the defensive, artificial conscience-guard-railed AI I’m thinking of would do the “last resort” things that were necessary to avoid s-risks, x-risks, and major catastrophes from coming to fruition, even if this wasn’t popular with most people, at least up to a point.
I can see the following scenario occur: the AI, with its AC, decided rightly that a pivotal act needs to be undertaken to avoid xrisk (or srisk). However, the public mostly doesn’t recognize the existence of such risks. The AI will proceed sabotaging people’s unsafe AI projects against public will. What happens now is: the public gets absolutely livid at the AI, that is subverting human power by acting against human will. Almost all humans team up to try to shut down the AI. The AI recognizes (and had already recognized) that if it looses, humans risk going extinct, so it fights this war against humanity and wins. I think in this scenario, an AI, even one with artificial conscience, could become the most hated thing on the planet.
I think people underestimate the amount of pushback we’re going to get once you get into pivotal act territory. That’s why I think it’s hugely preferred to go the democratic route and not count on AI taking unilateral actions, even if it would be smarter or even wiser, whatever that might mean exactly.
All that said, if we could somehow pause development of autonomous AI’s everywhere around the world until humans got their act together, developing their own consciences and senses of ethics, and were working as one team to cautiously take the next steps forward with AI, that would be great.
So yes definitely agree with this. I don’t think lack of conscience or ethics is the issue though, but existential risk awareness.
In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect anything after that like it would for a guard-railed AI that could do a huge amount of good after that if it weren’t shunned by society. Improving humans’ consciences and ethics would hopefully help avoid them hating the AI for saving them.
Also, if there were enough people, especially in power, who had strong consciences and senses of ethics, then maybe we’d be able to shift the political landscape from its current state of countries seemingly having different values and not trusting each other, to a world in which enforceable international agreements could be much more readily achieved.
I’m happy for people to work on increasing public awareness and trying for legislative “solutions,” but I think we should be working on artificial conscience at the same time—when there’s so much uncertainty about the future, it’s best to bet on a whole range of approaches, distributing your bets according to how likely you think different paths are to succeed. I think people are under-estimating the artificial conscience path right now, that’s all.
Thanks for your reply. I think we should use the term artificial conscience, not value alignment, for what you’re trying to do, for clarity. I’m happy to see we seem to agree that reversibility is important and replacing humans is an extremely bad outcome. (I’ve talked to people into value alignment of ASI who said they “would bite that bullet”, in other words would replace humanity by more efficient happy AI consciousness, so this point does not seem to be obvious. I’m also not convinced that leading longtermists necessarily think replacing humans is a bad outcome, and I think we should call them out on it.)
If one can implement artificial conscience in a reversible way, it might be an interesting approach. I think a minimum of what an aligned ASI would need to do is block other unaligned ASIs or ASI projects. If humanity supports this, I’d file it under a positive offense defense balance, which would be great. If humanity doesn’t support it, it would lead to conflict with humanity to do it anyway. I think an artificial conscience AI would either not want to fight that conflict (making it unable to stop unaligned ASI projects), or if it would, people would not see it as good anymore. I think societal awareness of xrisk and from there, support for regulation (either by AI or not) is what should make our future good, rather than aligning an ASI in a certain way.
Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.
I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans, then we made some sort of mistake. We most likely either got the artificial conscience wrong because that would’ve implicitly valued human life so wouldn’t have let a guard-railed AI wipe out humans, or we didn’t get an artificial conscience on board enough AI’s in time. An AI that had a “real” conscience also wouldn’t wipe out humans against the will of humans.
The way I currently envision the “typical” artificial conscience is that it would put a pretty strong conscience weight on not doing what its user wanted it to do, but this could be over-ruled by the conscience weight of not doing anything to prevent catastrophes. So the defensive, artificial conscience-guard-railed AI I’m thinking of would do the “last resort” things that were necessary to avoid s-risks, x-risks, and major catastrophes from coming to fruition, even if this wasn’t popular with most people, at least up to a point. If literally everyone in the world said, “Hey, we all want to die,” then the guard-railed AI, if it thought the people were in their “right mind,” would respect their wishes and let them die.
All that said, if we could somehow pause development of autonomous AI’s everywhere around the world until humans got their act together, developing their own consciences and senses of ethics, and were working as one team to cautiously take the next steps forward with AI, that would be great.
Again, I’m glad that we agree on this. I notice you want to do what I consider the right thing, and I appreciate that.
I can see the following scenario occur: the AI, with its AC, decided rightly that a pivotal act needs to be undertaken to avoid xrisk (or srisk). However, the public mostly doesn’t recognize the existence of such risks. The AI will proceed sabotaging people’s unsafe AI projects against public will. What happens now is: the public gets absolutely livid at the AI, that is subverting human power by acting against human will. Almost all humans team up to try to shut down the AI. The AI recognizes (and had already recognized) that if it looses, humans risk going extinct, so it fights this war against humanity and wins. I think in this scenario, an AI, even one with artificial conscience, could become the most hated thing on the planet.
I think people underestimate the amount of pushback we’re going to get once you get into pivotal act territory. That’s why I think it’s hugely preferred to go the democratic route and not count on AI taking unilateral actions, even if it would be smarter or even wiser, whatever that might mean exactly.
So yes definitely agree with this. I don’t think lack of conscience or ethics is the issue though, but existential risk awareness.
In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect anything after that like it would for a guard-railed AI that could do a huge amount of good after that if it weren’t shunned by society. Improving humans’ consciences and ethics would hopefully help avoid them hating the AI for saving them.
Also, if there were enough people, especially in power, who had strong consciences and senses of ethics, then maybe we’d be able to shift the political landscape from its current state of countries seemingly having different values and not trusting each other, to a world in which enforceable international agreements could be much more readily achieved.
I’m happy for people to work on increasing public awareness and trying for legislative “solutions,” but I think we should be working on artificial conscience at the same time—when there’s so much uncertainty about the future, it’s best to bet on a whole range of approaches, distributing your bets according to how likely you think different paths are to succeed. I think people are under-estimating the artificial conscience path right now, that’s all.
Thanks for all your comments!