No program now cares about what you mean. You’ve still not given any reason for the future software to care about “what you mean” over all those other calculation either.
I agree that current software products fail, such as in your autocorrect example. But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting “meditating” to “masturbating”?
Imagine it would make similar mistakes in any of the problems that it is required to solve in order to overpower humanity. And if humans succeeded to make it not make such mistakes along the way to overpowering humanity, how did they selectively fail at making it want to overpower humanity in the first place? How likely is that?
But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting “meditating” to “masturbating”?
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
A machine programmed to terminally value the outputs of a modern-day autocorrect will never self-modify to improve on that algorithm or its outputs (because that would violate its terminal values). The fact that this seems silly to a human doesn’t provide any causal mechanism for the AI to change its core preferences. Have we successfully coded the AI not to do things that humans find silly, and to prize un-silliness before all other things? If not, then where will that value come from?
A belief can be factually wrong. A non-representational behavior (or dynamic) is never factually right or wrong, only normatively right or wrong. (And that normative wrongness only constrains what actually occurs to the extent the norm is one a sufficiently powerful agent in the vicinity actually holds.)
Maybe that distinction is the one that’s missing. You’re assuming that an AI will be capable of optimizing for true beliefs if and only if it is also optimizing for possessing human norms. But, by the is/ought distinction, there is no true beliefs about the physical world that will spontaneously force a being that believes it to become more virtuous, if it didn’t already have a relevant seed of virtue within itself.
I’m confused as to the reason for the warning/outing, especially since the community seems to be doing an excellent job of dealing with his somewhat disjointed arguments. Downvotes, refutation, or banning in extreme cases are all viable forum-preserving responses. Publishing a dissenter’s name seems at best bad manners and at worst rather crass intimidation.
I only did a quick search on him and although some of the behavior was quite obnoxious, is there anything I’ve missed that justifies this?
XiXiDu wasn’t attempting or requesting anonymity—his LW profile openly lists his true name—and Alexander Kruel is someone with known problems (and a blog openly run under his true name) whom RobbBB might not know offhand was the same person as “XiXiDu” although this is public knowledge, nor might RobbBB realize that XiXiDu had the same irredeemable status as Loosemore.
I would not randomly out an LW poster for purposes of intimidation—I don’t think I’ve ever looked at a username’s associated private email address. Ever. Actually I’m not even sure offhand if our registration process requires/verifies that or not, since I was created as a pre-existing user at the dawn of time.
I do consider RobbBB’s work highly valuable and I don’t want him to feel disheartened by mistakenly thinking that a couple of eternal and irredeemable semitrolls are representative samples. Due to Civilizational Inadequacy, I don’t think it’s possible to ever convince the field of AI or philosophy of anything even as basic as the Orthogonality Thesis, but even I am not cynical enough to think that Loosemore or Kruel are representative samples.
Thanks, Eliezer! I knew who XiXiDu is. (And if I hadn’t, I think the content of his posts makes it easy to infer.)
There are a variety of reasons I find this discussion useful at the moment, and decided to stir it up. In particular, ground-floor disputes like this can be handy for forcing me to taboo inferential-gap-laden ideas and to convert premises I haven’t thought about at much length into actual arguments. But one of my reasons is not ‘I think this is representative of what serious FAI discussions look like (or ought to look like)’, no.
Glad to hear. It is interesting data that you managed to bring in 3 big name trolls for a single thread, considering their previous dispersion and lack of interest.
Thank you for the clarification. While I have a certain hesitance to throw around terms like “irredeemable”, I do understand the frustration with a certain, let’s say, overconfident and persistent brand of misunderstanding and how difficult it can be to maintain a public forum in its presence.
My one suggestion is that, if the goal was to avoid RobbBB’s (wonderfully high-quality comments, by the way) confusion, a private message might have been better. If the goal was more generally to minimize the confusion for those of us who are newer or less versed in LessWrong lore, more description might have been useful (“a known and persistent troll” or whatever) rather than just providing a name from the enemies list.
Though actually, Eliezer used similar phrasing regarding Richard Loosemore and got downvoted for it (not just by me). Admittedly, “persistent troll” is less extreme than “permanent idiot,” but even so, the statement could be phrased to be more useful.
I’d suggest, “We’ve presented similar arguments to [person] already, and [he or she] remained unconvinced. Ponder carefully before deciding to spend much time arguing with [him or her].”
Not only is it less offensive this way, it does a better job of explaining itself. (Note: the “ponder carefully” section is quoting Eliezer; that part of his post was fine.)
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
A self-improving AI needs a goal. A goal of self-improvement alone would work. A goal of getting things right in general would work too, and be much safer, as it would include getting our intentions right as a sub-goal.
Although since “self-improvement” in this context basically refers to “improving your ability to accomplish goals”...
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
Stop me if this is a non-secteur, but surely “having accurate beliefs” and “acting on those beliefs in a particular way” are completely different things? I haven’t really been following this conversation, though.
But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting “meditating” to “masturbating”?
As Robb said you’re confusing mistake in the sense of “The program is doing something we don’t want to do” with mistake in the sense of “The program has wrong beliefs about reality”.
I suppose a different way of thinking about these is “A mistaken human belief about the program” vs “A mistaken computer belief about the human”. We keep talking about the former (the program does something we didn’t know it would do), and you keep treating it as if it’s the latter.
Let’s say we have a program (not an AI, just a program) which uses Newton’s laws in order to calculate the trajectory of a ball. We want it to calculate this in order to have it move a tennis racket and hit the ball back.
When it finally runs, we observe that the program always avoids the ball rather than hit it back. Is it because it’s calculating the trajectory of the ball wrongly? No, it calculates the trajectory very well indeed, it’s just that an instruction in the program was wrongly inserted so that the end result is “DO NOT hit the ball back”.
It knows what the “trajectory of the ball” is. It knows what “hit the ball” is. But it’s program is “DO NOT hit the ball” rather than “hit the ball”. Why? Because of a human mistaken belief on what the program would do, not the program’s mistaken belief.
I agree that current software products fail, such as in your autocorrect example. But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting “meditating” to “masturbating”?
Imagine it would make similar mistakes in any of the problems that it is required to solve in order to overpower humanity. And if humans succeeded to make it not make such mistakes along the way to overpowering humanity, how did they selectively fail at making it want to overpower humanity in the first place? How likely is that?
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
A machine programmed to terminally value the outputs of a modern-day autocorrect will never self-modify to improve on that algorithm or its outputs (because that would violate its terminal values). The fact that this seems silly to a human doesn’t provide any causal mechanism for the AI to change its core preferences. Have we successfully coded the AI not to do things that humans find silly, and to prize un-silliness before all other things? If not, then where will that value come from?
A belief can be factually wrong. A non-representational behavior (or dynamic) is never factually right or wrong, only normatively right or wrong. (And that normative wrongness only constrains what actually occurs to the extent the norm is one a sufficiently powerful agent in the vicinity actually holds.)
Maybe that distinction is the one that’s missing. You’re assuming that an AI will be capable of optimizing for true beliefs if and only if it is also optimizing for possessing human norms. But, by the is/ought distinction, there is no true beliefs about the physical world that will spontaneously force a being that believes it to become more virtuous, if it didn’t already have a relevant seed of virtue within itself.
It also looks like user Juno_Watt is some type of systematic troll, probably a sockpuppet for someone else, haven’t bothered investigating who.
I can’t work out how this relates to the thread it appears in.
Warning as before: XiXiDu = Alexander Kruel.
I’m confused as to the reason for the warning/outing, especially since the community seems to be doing an excellent job of dealing with his somewhat disjointed arguments. Downvotes, refutation, or banning in extreme cases are all viable forum-preserving responses. Publishing a dissenter’s name seems at best bad manners and at worst rather crass intimidation.
I only did a quick search on him and although some of the behavior was quite obnoxious, is there anything I’ve missed that justifies this?
XiXiDu wasn’t attempting or requesting anonymity—his LW profile openly lists his true name—and Alexander Kruel is someone with known problems (and a blog openly run under his true name) whom RobbBB might not know offhand was the same person as “XiXiDu” although this is public knowledge, nor might RobbBB realize that XiXiDu had the same irredeemable status as Loosemore.
I would not randomly out an LW poster for purposes of intimidation—I don’t think I’ve ever looked at a username’s associated private email address. Ever. Actually I’m not even sure offhand if our registration process requires/verifies that or not, since I was created as a pre-existing user at the dawn of time.
I do consider RobbBB’s work highly valuable and I don’t want him to feel disheartened by mistakenly thinking that a couple of eternal and irredeemable semitrolls are representative samples. Due to Civilizational Inadequacy, I don’t think it’s possible to ever convince the field of AI or philosophy of anything even as basic as the Orthogonality Thesis, but even I am not cynical enough to think that Loosemore or Kruel are representative samples.
Thanks, Eliezer! I knew who XiXiDu is. (And if I hadn’t, I think the content of his posts makes it easy to infer.)
There are a variety of reasons I find this discussion useful at the moment, and decided to stir it up. In particular, ground-floor disputes like this can be handy for forcing me to taboo inferential-gap-laden ideas and to convert premises I haven’t thought about at much length into actual arguments. But one of my reasons is not ‘I think this is representative of what serious FAI discussions look like (or ought to look like)’, no.
Glad to hear. It is interesting data that you managed to bring in 3 big name trolls for a single thread, considering their previous dispersion and lack of interest.
Kruel hasn’t threatened to sue anyone for calling him an idiot, at least!
Pardon me, I’ve missed something. Who has threatened to sue someone for calling him an idiot? I’d have liked to see the inevitable “truth” defence.
Link.
Thank you for the clarification. While I have a certain hesitance to throw around terms like “irredeemable”, I do understand the frustration with a certain, let’s say, overconfident and persistent brand of misunderstanding and how difficult it can be to maintain a public forum in its presence.
My one suggestion is that, if the goal was to avoid RobbBB’s (wonderfully high-quality comments, by the way) confusion, a private message might have been better. If the goal was more generally to minimize the confusion for those of us who are newer or less versed in LessWrong lore, more description might have been useful (“a known and persistent troll” or whatever) rather than just providing a name from the enemies list.
Agreed.
Though actually, Eliezer used similar phrasing regarding Richard Loosemore and got downvoted for it (not just by me). Admittedly, “persistent troll” is less extreme than “permanent idiot,” but even so, the statement could be phrased to be more useful.
I’d suggest, “We’ve presented similar arguments to [person] already, and [he or she] remained unconvinced. Ponder carefully before deciding to spend much time arguing with [him or her].”
Not only is it less offensive this way, it does a better job of explaining itself. (Note: the “ponder carefully” section is quoting Eliezer; that part of his post was fine.)
Who has twice sworn off commenting on LW. So much for pre-commitments.
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
A self-improving AI needs a goal. A goal of self-improvement alone would work. A goal of getting things right in general would work too, and be much safer, as it would include getting our intentions right as a sub-goal.
Although since “self-improvement” in this context basically refers to “improving your ability to accomplish goals”...
Stop me if this is a non-secteur, but surely “having accurate beliefs” and “acting on those beliefs in a particular way” are completely different things? I haven’t really been following this conversation, though.
As Robb said you’re confusing mistake in the sense of “The program is doing something we don’t want to do” with mistake in the sense of “The program has wrong beliefs about reality”.
I suppose a different way of thinking about these is “A mistaken human belief about the program” vs “A mistaken computer belief about the human”. We keep talking about the former (the program does something we didn’t know it would do), and you keep treating it as if it’s the latter.
Let’s say we have a program (not an AI, just a program) which uses Newton’s laws in order to calculate the trajectory of a ball. We want it to calculate this in order to have it move a tennis racket and hit the ball back. When it finally runs, we observe that the program always avoids the ball rather than hit it back. Is it because it’s calculating the trajectory of the ball wrongly? No, it calculates the trajectory very well indeed, it’s just that an instruction in the program was wrongly inserted so that the end result is “DO NOT hit the ball back”.
It knows what the “trajectory of the ball” is. It knows what “hit the ball” is. But it’s program is “DO NOT hit the ball” rather than “hit the ball”. Why? Because of a human mistaken belief on what the program would do, not the program’s mistaken belief.
And you are confusing self-improving AIs with conventional programmes.