There are several famous science fiction stories about humans who program AIs to make humans happy, which then follow the letter of the law and do horrible things. The earliest is probably “With folded hands”, by Jack Williamson (1947), in which AIs are programmed to protect humans, and they do this by preventing humans from doing anything or going anywhere. The most recent may be the movie “I, Robot.”
I agree with E’s general point—that AI work often presupposes that the AI magically has the same concepts as its inventor, even outside the training data—but the argument he uses is insidious and has disastrous implications:
Which is the correct classification? This is not a property of the training data; it is a property of your preferences (or, if you prefer, a property of the idealized abstract dynamic you name “right”).
This is the most precise assertion of the relativist fallacy than I’ve ever seen. It’s so precise that its wrongness should leap out at you. (It’s a shame that most relativists don’t have the computational background for me to use it to explain why they’re wrong.)
By “relativism”, I mean (at the moment) the view that almost everything is just a point of view: There is no right or wrong, no beauty or ugliness. (Pure relativism would also claim that 2+2=5 is as valid as 2+2=4. There are people out there who think that. I’m not including that claim in my temporary definition.)
The argument for relativism is that you can never define anything precisely. You can’t even come up with a definition for the word “game”. So, the argument goes, whatever definition you use is okay. Stated more precisely, it would be Eliezer’s claim that, given a set of instances, any classifier that agrees with the input set is equally valid.
The counterargument is, in part, that some classifiers are better than others, even when all of them satisfy the training data completely. The most obvious criterion to use is the complexity of the classifier.
Eliezer’s argument, if he followed it through, would conclude that neural networks, and induction in general, can never work. The fact is that it often does.
There are several famous science fiction stories about humans who program AIs to make humans happy, which then follow the letter of the law and do horrible things. The earliest is probably “With folded hands”, by Jack Williamson (1947), in which AIs are programmed to protect humans, and they do this by preventing humans from doing anything or going anywhere. The most recent may be the movie “I, Robot.”
I agree with E’s general point—that AI work often presupposes that the AI magically has the same concepts as its inventor, even outside the training data—but the argument he uses is insidious and has disastrous implications:
This is the most precise assertion of the relativist fallacy than I’ve ever seen. It’s so precise that its wrongness should leap out at you. (It’s a shame that most relativists don’t have the computational background for me to use it to explain why they’re wrong.)By “relativism”, I mean (at the moment) the view that almost everything is just a point of view: There is no right or wrong, no beauty or ugliness. (Pure relativism would also claim that 2+2=5 is as valid as 2+2=4. There are people out there who think that. I’m not including that claim in my temporary definition.)
The argument for relativism is that you can never define anything precisely. You can’t even come up with a definition for the word “game”. So, the argument goes, whatever definition you use is okay. Stated more precisely, it would be Eliezer’s claim that, given a set of instances, any classifier that agrees with the input set is equally valid.
The counterargument is, in part, that some classifiers are better than others, even when all of them satisfy the training data completely. The most obvious criterion to use is the complexity of the classifier.
Eliezer’s argument, if he followed it through, would conclude that neural networks, and induction in general, can never work. The fact is that it often does.