I read most of the interchange between EY and BH. It appears to me that BH still doesn’t get a couple of points. The first is that smiley faces are an example of misclassification and it’s merely fortuitous to EY’s ends that BH actually spoke about designing an SI to use human happiness (and observed smiles) as its metric. He continues to speak in terms of “a system that is adequate for intelligence in its ability to rule the world, but absurdly inadequate for intelligence in its inability to distinguish a smiley face from a human.” EY’s point is that it isn’t sufficient to distinguish them, you have to also categorize them and all their variations correctly even though the training data can’t possibly include all variations.
The second is that EY’s attack isn’t intended to look like an attack on BH’s current ideas. It’s an attack on ideas that are good enough to pass peer review. It doesn’t matter to EY whether BH agrees or disagrees with those ideas. In either case, the paper’s publication shows that the viewpoint is plausible enough to be worth dismissing carefully and publicly.
Finally, BH points to the fact that, in some sense, human development uses RL to produce something we are willing to call intelligence. He wants to argue that this shows that RL can produce systems that categorize in a way that matches our consensus. But evolution has put many mechanisms in our ontogeny and relies an many interactions in our environment to produce those categorizations, and its success rate at producing entities that agree with the consensus isn’t perfect. In order to build an SI using those approaches, we’d have to understand how all that interaction works, and we’d have to do better than evolution does with us in order to be reliably safe.
I read most of the interchange between EY and BH. It appears to me that BH still doesn’t get a couple of points. The first is that smiley faces are an example of misclassification and it’s merely fortuitous to EY’s ends that BH actually spoke about designing an SI to use human happiness (and observed smiles) as its metric. He continues to speak in terms of “a system that is adequate for intelligence in its ability to rule the world, but absurdly inadequate for intelligence in its inability to distinguish a smiley face from a human.” EY’s point is that it isn’t sufficient to distinguish them, you have to also categorize them and all their variations correctly even though the training data can’t possibly include all variations.
The second is that EY’s attack isn’t intended to look like an attack on BH’s current ideas. It’s an attack on ideas that are good enough to pass peer review. It doesn’t matter to EY whether BH agrees or disagrees with those ideas. In either case, the paper’s publication shows that the viewpoint is plausible enough to be worth dismissing carefully and publicly.
Finally, BH points to the fact that, in some sense, human development uses RL to produce something we are willing to call intelligence. He wants to argue that this shows that RL can produce systems that categorize in a way that matches our consensus. But evolution has put many mechanisms in our ontogeny and relies an many interactions in our environment to produce those categorizations, and its success rate at producing entities that agree with the consensus isn’t perfect. In order to build an SI using those approaches, we’d have to understand how all that interaction works, and we’d have to do better than evolution does with us in order to be reliably safe.