gjm comments on [Book Review] “The Alignment Problem” by Brian Christian

gjm 20 Sep 2021 19:32 UTC
LW: 14 AF: 3
AF
The linked article is interesting, and also suggests that it’s not as simple as
The good solution is to add more Black people to the training dataset.
because the issue isn’t simply “our system sometimes misclassifies people as animals”, it’s “our system sometimes misclassifies people as animals, and one not-so-rare case of this happens to line up with an incredibly offensive old racist slur”—and that last bit is a subtle fact about human affairs that there’s no possible way the system could have learned from looking at labelled samples of images. The dataset had a good mix of races in it; humans do look rather like other great apes; in the absence of the long horrible history of racism this misclassification might have been benign. To do better the system would need, in some sense, to know about racism.
Maybe the best one can do really is something like artificially forbidding classifications like “ape” and “gorilla” and “monkey” unless the activations for classifications like “human” are very very low, at least until we have an image classifier that’s genuinely intelligent and understands human history.
(There are probably a lot of other misclassifications that are anomalously offensive, though few will have the weight of centuries of slavery and racist abuse behind them. Fixing them would also require the system to “know” about details of human history, and again the best one can do might be to push down certain activations when it’s at all possible that the image might be of people.)