You’re responding to an interpretation of what I said that assumes I’m stupid [...] I’m not retarded.
No one said you were stupid.
Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments?
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting).
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
If this idea is too vague or you’re interested I could expand on this elsewhere. [...] I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
AGI won’t automatically converge to Eliezer’s “good”
vs.
extrapolated volitions for all evolved agents cohere.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera.
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
(Downvoted.)
No one said you were stupid.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
(Upvoted.)
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
vs.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.