If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips, and the reason they wanted to do that was to make a profit for themselves, and the reason for that is they’re part of this larger system called humanity which is following this strange vector in preferencespace...
And the reason you value friendship is that “evolution” “made” it so, following the Big Bang. Informal descriptions of physical causes and effects don’t translate into moral arguments, and there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about, no more than caring about what they “hated”, or even to consider such a concept.
(I become more and more convinced that you do have a serious problem with the virtue of narrowness, better stop the meta-contrarian nonsense and work on that.)
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments? I’m not retarded. I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
This is stupid. I’m suggesting a hypothesis with low probability that is contrary to standard opinion. If you want to dismiss it via absurdity heuristic go ahead, but that doesn’t mean that there aren’t other people who might actually think about what I might mean while assuming that I’ve actually thought about the things I’m trying to say. This same annoying thing happened with Jef Allbright, who had interesting things to say but no one had the ontology to understand him so they just assumed he was speaking nonsense. Including Eliezer. LW inherited Eliezer’s weakness in this regard, though admittedly the strength of narrowness and precision was probably bolstered in its absence.
If what I am saying sounds mysterious, that is a fact about your unwillingness to be charitable as much as it is about my unwillingness to be precise. (And if you disagree with that, see it as an example.) That we are both apparently unwilling doesn’t mean that either of us is stupid. It just means that we are not each others’ intended audience.
You’re responding to an interpretation of what I said that assumes I’m stupid [...] I’m not retarded.
No one said you were stupid.
Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments?
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting).
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
If this idea is too vague or you’re interested I could expand on this elsewhere. [...] I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
AGI won’t automatically converge to Eliezer’s “good”
vs.
extrapolated volitions for all evolved agents cohere.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera.
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
Suppose someone who reliably does not generate common obviously wrong ideas/arguments has an uncommon idea that is wrong in a way that is non-obvious, but that you could explain if the wrong idea itself were precisely explained to you. But this person does not precisely explain their idea, but instead vaguely points to it with a description that sounds very much like a common obviously wrong idea. So you try to apply charity and fill in the gaps to figure out what they are really saying, but even if you do find the idea that they had in mind, you wouldn’t identify as such, because you see how that idea is wrong, and being charitable, you can’t interpret what they said in that way. How could you figure out what this person is talking about?
How could you figure out what this person is talking about?
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist. I’ll either have to get more precise or stop posting stuff that appears to be nonsense.
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist.
Sometimes it is impossible to reach an intended audience when the not-intended audience is using you as a punching bag to impress their intended audience. Most of debate in conventional practice is, after all, about trying to spin what the other person says to make them look bad. If your ‘intended audience’ then chose to engage with you at the level you were hoping to converse they risk being collaterally damaged in the social bombardment.
For my part I reached the conclusion that you are probably using a different conception of ‘morality’, analogous to the slightly different conception of ‘theism’ from your recent thread. This is dangerous because in group signalling incentives are such that people will be predictably inclined to ignore the novelty in your thoughts and target the nearest known stupid thing to what you say. And you must admit: you made it easy for them this time!
It may be worth reconsidering the point you are trying to discuss a little more carefully, and perhaps avoiding the use of the term ‘morality’. You could then make a post on the subject such that some people can understand your intended meaning and have useful conversation without risking losing face. It will not work with everyone, there are certain people you will just have to ignore. But you should get some useful discussion out of it. I note, for example, that while your ‘theism’ discussion got early downvotes by the most (to put it politely) passionate voters it ended up creeping up to positive.
As for guessing what sane things you may be trying to talk about I basically reached the conclusion “Either what you are getting at boils down to the outcome of acausal trade or it is stupid”. And acausal trade is something that I can not claim to be certain about.
I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments?
With each comment like this you make, and lack of comments that show clear understanding, I think that more and more confidently, yes. Disclaimers don’t help in such cases. You don’t have to be stupid, you clearly aren’t, but you seem to be using your intelligence to confuse yourself by lumping everything together instead of carefully examining distinct issues. Even if you actually understand something, adding a lot of noise over this understanding makes the overall model much less accurate.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about
Nor is there an a priori reason for an AI to exist, for it to understand what ‘paperclips’ are, let alone for it to self-improve through learning like a human child does, absorb human languages, and upgrade itself to the extent necessary to take over the world.
I suspect that any team of scientists or engineers with the knowledge and capability required to build an AGI with at least human-infant level cognitive capacity and the ability to learn human language will understand that making the AI’s goal system dynamic is not only advantageous, but is necessitated in practice by the cognitive capabilities required for understanding human language.
The idea of a paperclip maximizer taking over the world is a mostly harmless absurdity, but it also detracts from serious discussion.
And the reason you value friendship is that “evolution” “made” it so, following the Big Bang. Informal descriptions of physical causes and effects don’t translate into moral arguments, and there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about, no more than caring about what they “hated”, or even to consider such a concept.
(I become more and more convinced that you do have a serious problem with the virtue of narrowness, better stop the meta-contrarian nonsense and work on that.)
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments? I’m not retarded. I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
This is stupid. I’m suggesting a hypothesis with low probability that is contrary to standard opinion. If you want to dismiss it via absurdity heuristic go ahead, but that doesn’t mean that there aren’t other people who might actually think about what I might mean while assuming that I’ve actually thought about the things I’m trying to say. This same annoying thing happened with Jef Allbright, who had interesting things to say but no one had the ontology to understand him so they just assumed he was speaking nonsense. Including Eliezer. LW inherited Eliezer’s weakness in this regard, though admittedly the strength of narrowness and precision was probably bolstered in its absence.
If what I am saying sounds mysterious, that is a fact about your unwillingness to be charitable as much as it is about my unwillingness to be precise. (And if you disagree with that, see it as an example.) That we are both apparently unwilling doesn’t mean that either of us is stupid. It just means that we are not each others’ intended audience.
(Downvoted.)
No one said you were stupid.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
(Upvoted.)
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
vs.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
Suppose someone who reliably does not generate common obviously wrong ideas/arguments has an uncommon idea that is wrong in a way that is non-obvious, but that you could explain if the wrong idea itself were precisely explained to you. But this person does not precisely explain their idea, but instead vaguely points to it with a description that sounds very much like a common obviously wrong idea. So you try to apply charity and fill in the gaps to figure out what they are really saying, but even if you do find the idea that they had in mind, you wouldn’t identify as such, because you see how that idea is wrong, and being charitable, you can’t interpret what they said in that way. How could you figure out what this person is talking about?
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist. I’ll either have to get more precise or stop posting stuff that appears to be nonsense.
Sometimes it is impossible to reach an intended audience when the not-intended audience is using you as a punching bag to impress their intended audience. Most of debate in conventional practice is, after all, about trying to spin what the other person says to make them look bad. If your ‘intended audience’ then chose to engage with you at the level you were hoping to converse they risk being collaterally damaged in the social bombardment.
For my part I reached the conclusion that you are probably using a different conception of ‘morality’, analogous to the slightly different conception of ‘theism’ from your recent thread. This is dangerous because in group signalling incentives are such that people will be predictably inclined to ignore the novelty in your thoughts and target the nearest known stupid thing to what you say. And you must admit: you made it easy for them this time!
It may be worth reconsidering the point you are trying to discuss a little more carefully, and perhaps avoiding the use of the term ‘morality’. You could then make a post on the subject such that some people can understand your intended meaning and have useful conversation without risking losing face. It will not work with everyone, there are certain people you will just have to ignore. But you should get some useful discussion out of it. I note, for example, that while your ‘theism’ discussion got early downvotes by the most (to put it politely) passionate voters it ended up creeping up to positive.
As for guessing what sane things you may be trying to talk about I basically reached the conclusion “Either what you are getting at boils down to the outcome of acausal trade or it is stupid”. And acausal trade is something that I can not claim to be certain about.
That sounds bad—perhaps reconsider.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
With each comment like this you make, and lack of comments that show clear understanding, I think that more and more confidently, yes. Disclaimers don’t help in such cases. You don’t have to be stupid, you clearly aren’t, but you seem to be using your intelligence to confuse yourself by lumping everything together instead of carefully examining distinct issues. Even if you actually understand something, adding a lot of noise over this understanding makes the overall model much less accurate.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
Nor is there an a priori reason for an AI to exist, for it to understand what ‘paperclips’ are, let alone for it to self-improve through learning like a human child does, absorb human languages, and upgrade itself to the extent necessary to take over the world.
I suspect that any team of scientists or engineers with the knowledge and capability required to build an AGI with at least human-infant level cognitive capacity and the ability to learn human language will understand that making the AI’s goal system dynamic is not only advantageous, but is necessitated in practice by the cognitive capabilities required for understanding human language.
The idea of a paperclip maximizer taking over the world is a mostly harmless absurdity, but it also detracts from serious discussion.