If an AI does what Roko suggested, it’s not friendly. We don’t know what, if anything, CEV will output, but I don’t see any reason to think CEV would enact Roko’s scenario.
Because ‘CEV’ must be instantiated on a group of agents (usually humans). Some humans are assholes. So for some value of aGroup, CEV does assholish things. Hopefully the group of all humans doesn’t create a CEV that makes FAI> an outright uFAI from our perspective but we certainly shouldn’t count on it.
Some humans are assholes. So for some value of aGroup, CEV does assholish things.
That’s not necessarily true. CEV isn’t precisely defined but it’s intended to represent the idealized version of our desires and meta-desires. So even if we take a group of assholes, they don’t necessarily want to be assholes, or want to want to be assholes, or maybe they wouldn’t want to if they knew more and were smarter.
I refer, of course, to people whose preferences really are different to our own. Coherent Extrapolated Assholes. I don’t refer to people who would really have preferences that I would consider acceptable if they just knew a bit more.
You asked for an explanation of how a correctly implemented ‘CEV’ could want something abhorrent. That’s how.
There is an unfortunate tendency to glorify the extrapolation process and pretend that it makes any given individual or group have acceptable values. It need not.
Upvoted for the phrase “Coherent Extrapolated Assholes”. Best. Insult. Ever.
Seriously, though, I don’t think there are many CEAs around, anyway. (This doesn’t mean there are none, either. (I was going to link to this as an example of one, but I’m not sure Hitler would have done what he did had he known about late-20th-century results about heterosis, Ashkenazi Jew intelligence, etc.)) This mean that I think it’s very, very unlikely for CEV to be evil (and even less likely to be evil>), unless the membership criteria to aGroup are gerrymandered to make it so.
There is an unfortunate tendency to glorify the extrapolation process and pretend that it makes any given individual or group have acceptable values. It need not.
It seemed odd to me that so few people were bothered by the claims that CEV shouldn’t care much about the inputs. If you expect it to give similar results if you put in a chimpanzee and a murderer and Archimedes, then why put in anything at all instead of just printing out the only results it gives?
If you believe in moral progress (and CEV seems to rely on that position), then there’s every reason to think that future-society would want to make changes to how we live, if future-society had the capacity to make that type of intervention.
In short, wouldn’t you change the past to prevent the occurrence of chattel slavery if you could? (If you don’t like that example, substitute preventing the October revolution or whatever example fits your preferences).
Punishment from the future is spooky enough. Imagine what an anti-Guns of the South would be like for the temporal locals. Not pleasant, that’s for sure.
Doesn’t CEV implicitly assert that there exists a set of moral assertions M that is more reliably moral than anything humans assert today, and that it’s possible for a sufficiently intelligent system to derive M?
That sure sounds like a belief in moral progress to me.
Granted, it doesn’t imply that humans left to their own devices will achieve moral progress. But the same is true of technological progress.
Doesn’t CEV implicitly assert that there exists a set of moral assertions M that is more reliably moral than anything humans assert today, and that it’s possible for a sufficiently intelligent system to derive M?
The implicit assertion is “Greater or Equal”, not “Greater”.
Run on a True Conservative it will return the morals that the conservative currently has.
Mm. I’ll certainly agree that anyone for whom that’s true deserves the title “True Conservative.”
I don’t think I’ve ever met anyone who meets that description, though I’ve met people who would probably describe themselves that way.
Presumably, someone who believes this is true of themselves would consider the whole notion of extrapolating the target definition for a superhumanly powerful optimization process to be silly, though, and consider the label CEV to be technically accurate, in the same sense that I’m currently extrapolating the presence of my laptop, but to imply falsehoods.
No AI is friendly. That’s a naive idea. Is a FAI friendly towards superintelligent, highly conscious uFAI? No, it’s not. It will kill it. Same as it will kill all other entities who’ll try to do what they want with the universe. Friendliness is subjective and cannot be guaranteed.
Is a FAI friendly towards superintelligent, highly conscious uFAI? No, it’s not. It will kill it.
Are you sure? Random alternative possibilities:
Hack it and make it friendly
Assimilate it
Externally constrain its actions
Toss it into another universe where humanity doesn’t exist
Unless you’re one yourself, it’s rather difficult to predict what other options a superintelligence might come up with, that you never even considered.
How do you want to guarantee friendliness? If there are post-Singularity aliens out there then their CEV might be opposed to that of humanity, which would ultimately either mean our or their extinction. Obviously any CEV acted on by some friendly AI is a dictatorship that regards any disruptive elements, such as unfriendly AI’s and aliens, as an existential risk. You might call this friendly, I don’t. It’s simply one way to shape the universe that is favored by the SIAI, a bunch of human beings who want to imprint the universe with an anthropocentric version of CEV. Therefore, as I said above, friendliness is subjective and cannot be guaranteed. I don’t even think that it can be guaranteed subjectively, as any personal CEV would ultimately be a feedback process favoring certain constants between you and the friendly AI trying suit your preferences. If you like sex, the AI will provide you with better sex which in turn will make you like sex even more and so on. Any CEV is prone to be a paperclip maximizer seen from any position that is not regarded by the CEV. That’s not friendliness, it’s just a convoluted way of shaping the universe according to your will.
That’s not friendliness, it’s just a convoluted way of shaping the universe according to your will.
Yes, it’s a “Friendly to me” AI that I want. (Where replacing ‘me’ with other individuals or groups with acceptable values would be better than nothing.) I don’t necessarily want it to be friendly in the general colloquial sense. I don’t particularly mind if you call it something less ‘nice’ sounding than Friendly.
I don’t even think that it can be guaranteed subjectively, as any personal CEV would ultimately be a feedback process favoring certain constants between you and the friendly AI trying suit your preferences. If you like sex, the AI will provide you with better sex which in turn will make you like sex even more and so on.
Here we disagree on a matter of real substance. If I do not want my preferences to be altered in the kind of way you mention then a Friendly (to me) AI doesn’t do them. This is tautological. Creating a system an guaranteeing that it works as specified is then ‘just’ a matter of engineering and mathematics. (Where ‘just’ means ‘harder than anything humans have ever done’.)
If I do not want my preferences to be altered in the kind of way you mention then a Friendly (to me) AI doesn’t do them.
I just don’t see how that is possible without the AI becoming a primary attractor and therefore fundamentally altering the trajectory of your preferences. I’d favor the way Kurzweil portrays a technological Singularity here, where humans themselves become the Gods. I do not want to live in a universe where I’m just a puppet of the seed I once sowed. That is, I want to implement my own volition without the oversight of a caretaker God. As long as there is a being vastly superior to me that takes interest in my own matters, even the mere observer effect will alter my preferences since I’d have to take this being into account in everything conceivable.
The whole idea of friendly AI, even if it was created to suit only my personal volition, reminds me of the promises of the old religions. This horrible boring universe where nothing bad can happen to you and everything is already figured out by this one being. Sure, it wouldn’t figure it out if it knew I want to do that myself. But that be pretty dumb, as it could if I wanted it to. And that’s just the case with my personal friendly AI. One based on the extrapolated volition of humanity would very likely not be friendly towards me and would ultimately dictate what I can and cannot do.
Really the only favorable possibility here is to merge with the AI. But that would mean instant annihilation to me as I would add nothing to a being that vast. So I still hope that AI going foom is wrong and that we see a slow development over many centuries instead, without any singularity type event.
And I’m aware that big government and other environmental influences are altering and stearing my preferences as well. But they are much more fuzzy whereas a friendly AI is very specific. The more specific, the less free will I do have. That is, the higher the ratio of influence and effectiveness of control that I exert over the environment to the environment over me the more free I am to implement what I want to do versus what others want me to do.
I’d favor the way Kurzweil portrays a technological Singularity here, where humans themselves become the Gods.
The problem with having a pantheon of Gods… they tend to bicker. With metaphorical lightening bolts. ;)
I don’t that outcome would be incompatible with a FAI (which may be necessary to do the research to get you your godlike powers). Apart from the initial enabling the FAI would provide the new ‘Gods’ could choose by mutual agreement to create some form of power structure that prevented them from messing each other over and burning the cosmic commons in competition.
So I still hope that AI going foom is wrong and that we see a slow development over many centuries instead, without any singularity type event.
You talked about the downside to mere observation. That would be utterly trivial and benign compared to the effects of Malthusian competition. Humans are not in a stable equilibrium now. We rely on intuitions created in a different time and different circumstances to prevent us from rapidly rushing to a miserable equilibrium of subsistence living.
The longer we go before putting a check on evolutionary pressure towards maximum securing of resources the more we will lose that which we value as ‘human’. Yes everything we value except existence itself. Even consciousness in the form that we experience it.
The longer we go before putting a check on evolutionary pressure towards maximum securing of resources the more we will lose that which we value as ‘human’. Yes everything we value except existence itself. Even consciousness in the form that we experience it.
I don’t think I emphasised this enough. Unless the ultimate cooperation problem is solved we will devolve to something that is less human than Clippy. Clippy at least has a goal that he seeks to maximise and which motivates his quest for power. Competition would weed out even that much personality.
If an AI does what Roko suggested, it’s not friendly. We don’t know what, if anything, CEV will output, but I don’t see any reason to think CEV would enact Roko’s scenario.
Until about a month ago, I would have agreed, but some posts I have since read on LW made me update the probability of CEV wanting that upwards.
Really, please explain (or PM me if it would require breaking the gag rule on Roko’s scenario). Why would CEV want that?
Really, please explain (or PM me if it would require breaking the gag rule on Roko’s scenario). Why would CEV want that?
Because ‘CEV’ must be instantiated on a group of agents (usually humans). Some humans are assholes. So for some value of aGroup, CEV does assholish things. Hopefully the group of all humans doesn’t create a CEV that makes FAI> an outright uFAI from our perspective but we certainly shouldn’t count on it.
That’s not necessarily true. CEV isn’t precisely defined but it’s intended to represent the idealized version of our desires and meta-desires. So even if we take a group of assholes, they don’t necessarily want to be assholes, or want to want to be assholes, or maybe they wouldn’t want to if they knew more and were smarter.
I refer, of course, to people whose preferences really are different to our own. Coherent Extrapolated Assholes. I don’t refer to people who would really have preferences that I would consider acceptable if they just knew a bit more.
You asked for an explanation of how a correctly implemented ‘CEV’ could want something abhorrent. That’s how.
There is an unfortunate tendency to glorify the extrapolation process and pretend that it makes any given individual or group have acceptable values. It need not.
Upvoted for the phrase “Coherent Extrapolated Assholes”. Best. Insult. Ever.
Seriously, though, I don’t think there are many CEAs around, anyway. (This doesn’t mean there are none, either. (I was going to link to this as an example of one, but I’m not sure Hitler would have done what he did had he known about late-20th-century results about heterosis, Ashkenazi Jew intelligence, etc.)) This mean that I think it’s very, very unlikely for CEV to be evil (and even less likely to be evil>), unless the membership criteria to aGroup are gerrymandered to make it so.
It seemed odd to me that so few people were bothered by the claims that CEV shouldn’t care much about the inputs. If you expect it to give similar results if you put in a chimpanzee and a murderer and Archimedes, then why put in anything at all instead of just printing out the only results it gives?
If you believe in moral progress (and CEV seems to rely on that position), then there’s every reason to think that future-society would want to make changes to how we live, if future-society had the capacity to make that type of intervention.
In short, wouldn’t you change the past to prevent the occurrence of chattel slavery if you could? (If you don’t like that example, substitute preventing the October revolution or whatever example fits your preferences).
It’s more agnostic on the issue. It works just as well for the ultimate conservative.
I wouldn’t torture innocent people to prevent it, no.
Punishment from the future is spooky enough. Imagine what an anti-Guns of the South would be like for the temporal locals. Not pleasant, that’s for sure.
It’s more agnostic on the issue. It works just as well for the ultimate conservative.
Doesn’t CEV implicitly assert that there exists a set of moral assertions M that is more reliably moral than anything humans assert today, and that it’s possible for a sufficiently intelligent system to derive M?
That sure sounds like a belief in moral progress to me.
Granted, it doesn’t imply that humans left to their own devices will achieve moral progress. But the same is true of technological progress.
The implicit assertion is “Greater or Equal”, not “Greater”.
Run on a True Conservative it will return the morals that the conservative currently has.
Mm.
I’ll certainly agree that anyone for whom that’s true deserves the title “True Conservative.”
I don’t think I’ve ever met anyone who meets that description, though I’ve met people who would probably describe themselves that way.
Presumably, someone who believes this is true of themselves would consider the whole notion of extrapolating the target definition for a superhumanly powerful optimization process to be silly, though, and consider the label CEV to be technically accurate, in the same sense that I’m currently extrapolating the presence of my laptop, but to imply falsehoods.
Roko thinks (or thought) it would. I do too. Can’t argue it in detail here, sorry.
No AI is friendly. That’s a naive idea. Is a FAI friendly towards superintelligent, highly conscious uFAI? No, it’s not. It will kill it. Same as it will kill all other entities who’ll try to do what they want with the universe. Friendliness is subjective and cannot be guaranteed.
Are you sure? Random alternative possibilities:
Hack it and make it friendly
Assimilate it
Externally constrain its actions
Toss it into another universe where humanity doesn’t exist
Unless you’re one yourself, it’s rather difficult to predict what other options a superintelligence might come up with, that you never even considered.
Yes to subjective, no to guaranteed.
How do you want to guarantee friendliness? If there are post-Singularity aliens out there then their CEV might be opposed to that of humanity, which would ultimately either mean our or their extinction. Obviously any CEV acted on by some friendly AI is a dictatorship that regards any disruptive elements, such as unfriendly AI’s and aliens, as an existential risk. You might call this friendly, I don’t. It’s simply one way to shape the universe that is favored by the SIAI, a bunch of human beings who want to imprint the universe with an anthropocentric version of CEV. Therefore, as I said above, friendliness is subjective and cannot be guaranteed. I don’t even think that it can be guaranteed subjectively, as any personal CEV would ultimately be a feedback process favoring certain constants between you and the friendly AI trying suit your preferences. If you like sex, the AI will provide you with better sex which in turn will make you like sex even more and so on. Any CEV is prone to be a paperclip maximizer seen from any position that is not regarded by the CEV. That’s not friendliness, it’s just a convoluted way of shaping the universe according to your will.
Yes, it’s a “Friendly to me” AI that I want. (Where replacing ‘me’ with other individuals or groups with acceptable values would be better than nothing.) I don’t necessarily want it to be friendly in the general colloquial sense. I don’t particularly mind if you call it something less ‘nice’ sounding than Friendly.
Here we disagree on a matter of real substance. If I do not want my preferences to be altered in the kind of way you mention then a Friendly (to me) AI doesn’t do them. This is tautological. Creating a system an guaranteeing that it works as specified is then ‘just’ a matter of engineering and mathematics. (Where ‘just’ means ‘harder than anything humans have ever done’.)
I just don’t see how that is possible without the AI becoming a primary attractor and therefore fundamentally altering the trajectory of your preferences. I’d favor the way Kurzweil portrays a technological Singularity here, where humans themselves become the Gods. I do not want to live in a universe where I’m just a puppet of the seed I once sowed. That is, I want to implement my own volition without the oversight of a caretaker God. As long as there is a being vastly superior to me that takes interest in my own matters, even the mere observer effect will alter my preferences since I’d have to take this being into account in everything conceivable.
The whole idea of friendly AI, even if it was created to suit only my personal volition, reminds me of the promises of the old religions. This horrible boring universe where nothing bad can happen to you and everything is already figured out by this one being. Sure, it wouldn’t figure it out if it knew I want to do that myself. But that be pretty dumb, as it could if I wanted it to. And that’s just the case with my personal friendly AI. One based on the extrapolated volition of humanity would very likely not be friendly towards me and would ultimately dictate what I can and cannot do.
Really the only favorable possibility here is to merge with the AI. But that would mean instant annihilation to me as I would add nothing to a being that vast. So I still hope that AI going foom is wrong and that we see a slow development over many centuries instead, without any singularity type event.
And I’m aware that big government and other environmental influences are altering and stearing my preferences as well. But they are much more fuzzy whereas a friendly AI is very specific. The more specific, the less free will I do have. That is, the higher the ratio of influence and effectiveness of control that I exert over the environment to the environment over me the more free I am to implement what I want to do versus what others want me to do.
The problem with having a pantheon of Gods… they tend to bicker. With metaphorical lightening bolts. ;)
I don’t that outcome would be incompatible with a FAI (which may be necessary to do the research to get you your godlike powers). Apart from the initial enabling the FAI would provide the new ‘Gods’ could choose by mutual agreement to create some form of power structure that prevented them from messing each other over and burning the cosmic commons in competition.
You talked about the downside to mere observation. That would be utterly trivial and benign compared to the effects of Malthusian competition. Humans are not in a stable equilibrium now. We rely on intuitions created in a different time and different circumstances to prevent us from rapidly rushing to a miserable equilibrium of subsistence living.
The longer we go before putting a check on evolutionary pressure towards maximum securing of resources the more we will lose that which we value as ‘human’. Yes everything we value except existence itself. Even consciousness in the form that we experience it.
I don’t think I emphasised this enough. Unless the ultimate cooperation problem is solved we will devolve to something that is less human than Clippy. Clippy at least has a goal that he seeks to maximise and which motivates his quest for power. Competition would weed out even that much personality.