But how could a seed AI be able to make itself superhuman powerful if it did not care about avoiding mistakes such as autocoreccting “meditating” to “masturbating”?
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
A machine programmed to terminally value the outputs of a modern-day autocorrect will never self-modify to improve on that algorithm or its outputs (because that would violate its terminal values). The fact that this seems silly to a human doesn’t provide any causal mechanism for the AI to change its core preferences. Have we successfully coded the AI not to do things that humans find silly, and to prize un-silliness before all other things? If not, then where will that value come from?
A belief can be factually wrong. A non-representational behavior (or dynamic) is never factually right or wrong, only normatively right or wrong. (And that normative wrongness only constrains what actually occurs to the extent the norm is one a sufficiently powerful agent in the vicinity actually holds.)
Maybe that distinction is the one that’s missing. You’re assuming that an AI will be capable of optimizing for true beliefs if and only if it is also optimizing for possessing human norms. But, by the is/ought distinction, there is no true beliefs about the physical world that will spontaneously force a being that believes it to become more virtuous, if it didn’t already have a relevant seed of virtue within itself.
I’m confused as to the reason for the warning/outing, especially since the community seems to be doing an excellent job of dealing with his somewhat disjointed arguments. Downvotes, refutation, or banning in extreme cases are all viable forum-preserving responses. Publishing a dissenter’s name seems at best bad manners and at worst rather crass intimidation.
I only did a quick search on him and although some of the behavior was quite obnoxious, is there anything I’ve missed that justifies this?
XiXiDu wasn’t attempting or requesting anonymity—his LW profile openly lists his true name—and Alexander Kruel is someone with known problems (and a blog openly run under his true name) whom RobbBB might not know offhand was the same person as “XiXiDu” although this is public knowledge, nor might RobbBB realize that XiXiDu had the same irredeemable status as Loosemore.
I would not randomly out an LW poster for purposes of intimidation—I don’t think I’ve ever looked at a username’s associated private email address. Ever. Actually I’m not even sure offhand if our registration process requires/verifies that or not, since I was created as a pre-existing user at the dawn of time.
I do consider RobbBB’s work highly valuable and I don’t want him to feel disheartened by mistakenly thinking that a couple of eternal and irredeemable semitrolls are representative samples. Due to Civilizational Inadequacy, I don’t think it’s possible to ever convince the field of AI or philosophy of anything even as basic as the Orthogonality Thesis, but even I am not cynical enough to think that Loosemore or Kruel are representative samples.
Thanks, Eliezer! I knew who XiXiDu is. (And if I hadn’t, I think the content of his posts makes it easy to infer.)
There are a variety of reasons I find this discussion useful at the moment, and decided to stir it up. In particular, ground-floor disputes like this can be handy for forcing me to taboo inferential-gap-laden ideas and to convert premises I haven’t thought about at much length into actual arguments. But one of my reasons is not ‘I think this is representative of what serious FAI discussions look like (or ought to look like)’, no.
Glad to hear. It is interesting data that you managed to bring in 3 big name trolls for a single thread, considering their previous dispersion and lack of interest.
Thank you for the clarification. While I have a certain hesitance to throw around terms like “irredeemable”, I do understand the frustration with a certain, let’s say, overconfident and persistent brand of misunderstanding and how difficult it can be to maintain a public forum in its presence.
My one suggestion is that, if the goal was to avoid RobbBB’s (wonderfully high-quality comments, by the way) confusion, a private message might have been better. If the goal was more generally to minimize the confusion for those of us who are newer or less versed in LessWrong lore, more description might have been useful (“a known and persistent troll” or whatever) rather than just providing a name from the enemies list.
Though actually, Eliezer used similar phrasing regarding Richard Loosemore and got downvoted for it (not just by me). Admittedly, “persistent troll” is less extreme than “permanent idiot,” but even so, the statement could be phrased to be more useful.
I’d suggest, “We’ve presented similar arguments to [person] already, and [he or she] remained unconvinced. Ponder carefully before deciding to spend much time arguing with [him or her].”
Not only is it less offensive this way, it does a better job of explaining itself. (Note: the “ponder carefully” section is quoting Eliezer; that part of his post was fine.)
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
A self-improving AI needs a goal. A goal of self-improvement alone would work. A goal of getting things right in general would work too, and be much safer, as it would include getting our intentions right as a sub-goal.
Although since “self-improvement” in this context basically refers to “improving your ability to accomplish goals”...
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
Stop me if this is a non-secteur, but surely “having accurate beliefs” and “acting on those beliefs in a particular way” are completely different things? I haven’t really been following this conversation, though.
Those are only ‘mistakes’ if you value human intentions. A grammatical error is only an error because we value the specific rules of grammar we do; it’s not the same sort of thing as a false belief (though it may stem from, or result in, false beliefs).
A machine programmed to terminally value the outputs of a modern-day autocorrect will never self-modify to improve on that algorithm or its outputs (because that would violate its terminal values). The fact that this seems silly to a human doesn’t provide any causal mechanism for the AI to change its core preferences. Have we successfully coded the AI not to do things that humans find silly, and to prize un-silliness before all other things? If not, then where will that value come from?
A belief can be factually wrong. A non-representational behavior (or dynamic) is never factually right or wrong, only normatively right or wrong. (And that normative wrongness only constrains what actually occurs to the extent the norm is one a sufficiently powerful agent in the vicinity actually holds.)
Maybe that distinction is the one that’s missing. You’re assuming that an AI will be capable of optimizing for true beliefs if and only if it is also optimizing for possessing human norms. But, by the is/ought distinction, there is no true beliefs about the physical world that will spontaneously force a being that believes it to become more virtuous, if it didn’t already have a relevant seed of virtue within itself.
It also looks like user Juno_Watt is some type of systematic troll, probably a sockpuppet for someone else, haven’t bothered investigating who.
I can’t work out how this relates to the thread it appears in.
Warning as before: XiXiDu = Alexander Kruel.
I’m confused as to the reason for the warning/outing, especially since the community seems to be doing an excellent job of dealing with his somewhat disjointed arguments. Downvotes, refutation, or banning in extreme cases are all viable forum-preserving responses. Publishing a dissenter’s name seems at best bad manners and at worst rather crass intimidation.
I only did a quick search on him and although some of the behavior was quite obnoxious, is there anything I’ve missed that justifies this?
XiXiDu wasn’t attempting or requesting anonymity—his LW profile openly lists his true name—and Alexander Kruel is someone with known problems (and a blog openly run under his true name) whom RobbBB might not know offhand was the same person as “XiXiDu” although this is public knowledge, nor might RobbBB realize that XiXiDu had the same irredeemable status as Loosemore.
I would not randomly out an LW poster for purposes of intimidation—I don’t think I’ve ever looked at a username’s associated private email address. Ever. Actually I’m not even sure offhand if our registration process requires/verifies that or not, since I was created as a pre-existing user at the dawn of time.
I do consider RobbBB’s work highly valuable and I don’t want him to feel disheartened by mistakenly thinking that a couple of eternal and irredeemable semitrolls are representative samples. Due to Civilizational Inadequacy, I don’t think it’s possible to ever convince the field of AI or philosophy of anything even as basic as the Orthogonality Thesis, but even I am not cynical enough to think that Loosemore or Kruel are representative samples.
Thanks, Eliezer! I knew who XiXiDu is. (And if I hadn’t, I think the content of his posts makes it easy to infer.)
There are a variety of reasons I find this discussion useful at the moment, and decided to stir it up. In particular, ground-floor disputes like this can be handy for forcing me to taboo inferential-gap-laden ideas and to convert premises I haven’t thought about at much length into actual arguments. But one of my reasons is not ‘I think this is representative of what serious FAI discussions look like (or ought to look like)’, no.
Glad to hear. It is interesting data that you managed to bring in 3 big name trolls for a single thread, considering their previous dispersion and lack of interest.
Kruel hasn’t threatened to sue anyone for calling him an idiot, at least!
Pardon me, I’ve missed something. Who has threatened to sue someone for calling him an idiot? I’d have liked to see the inevitable “truth” defence.
Link.
Thank you for the clarification. While I have a certain hesitance to throw around terms like “irredeemable”, I do understand the frustration with a certain, let’s say, overconfident and persistent brand of misunderstanding and how difficult it can be to maintain a public forum in its presence.
My one suggestion is that, if the goal was to avoid RobbBB’s (wonderfully high-quality comments, by the way) confusion, a private message might have been better. If the goal was more generally to minimize the confusion for those of us who are newer or less versed in LessWrong lore, more description might have been useful (“a known and persistent troll” or whatever) rather than just providing a name from the enemies list.
Agreed.
Though actually, Eliezer used similar phrasing regarding Richard Loosemore and got downvoted for it (not just by me). Admittedly, “persistent troll” is less extreme than “permanent idiot,” but even so, the statement could be phrased to be more useful.
I’d suggest, “We’ve presented similar arguments to [person] already, and [he or she] remained unconvinced. Ponder carefully before deciding to spend much time arguing with [him or her].”
Not only is it less offensive this way, it does a better job of explaining itself. (Note: the “ponder carefully” section is quoting Eliezer; that part of his post was fine.)
Who has twice sworn off commenting on LW. So much for pre-commitments.
You will see a grammatical error as a mistake if you value grammar in general, or if you value being right in general.
A self-improving AI needs a goal. A goal of self-improvement alone would work. A goal of getting things right in general would work too, and be much safer, as it would include getting our intentions right as a sub-goal.
Although since “self-improvement” in this context basically refers to “improving your ability to accomplish goals”...
Stop me if this is a non-secteur, but surely “having accurate beliefs” and “acting on those beliefs in a particular way” are completely different things? I haven’t really been following this conversation, though.