To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal—and them implement a system that implements that cut.
Strongly disagree. The whole point of Bayesian reasoning is that it allows us to deal with uncertainty. And one huge source of uncertainty is that we don’t have precise understandings of the concepts we use. When we first learn a new concept, we have a ton of uncertainty about its location in thingspace. As we collect more data (either through direct observation or indirectly through communication with other humans), we are able to decrease that uncertainty, but it never goes away completely. An AI which uses human concepts will have to be able to deal with concept-uncertainty and the complications that arise as a result.
The fact that humans can’t always agree with each other on what constitutes porn vs. erotica demonstrates that we don’t all carve reality up in the same places (and therefore there’s no “objective” definition of porn). The fact that individual humans often have trouble classifying edge cases demonstrates that even when you look at a single person’s concept, it will still contain some uncertainty. The more we discuss and negotiate the meanings of concepts, the less fuzzy the boundaries will become, but we can’t remove the fuzziness completely. We can write out a legal definition of porn, but it won’t necessarily correspond to the black-box classifiers that real people are using. And concepts change—what we think of as porn might be classified differently in 100 years. An AI can’t just find a single carving of reality and stick with it; the AI needs to adapt its knowledge as the concepts mutate.
So I’m pretty sure that what you’re asking is impossible. The concept-boundaries in thingspace remain fuzzy until humans negotiate them by discussing specific edge cases. (And even then, they are still fuzzy, just slightly less so.) So there’s no way to find the concept boundaries without asking people about it; it’s the interaction between human decision makers that define the concept in the first place.
The fact that individual humans often have trouble classifying edge cases demonstrates that even when you look at a single person’s concept, it will still contain some uncertainty.
Strongly disagree. The whole point of Bayesian reasoning is that it allows us to deal with uncertainty. And one huge source of uncertainty is that we don’t have precise understandings of the concepts we use. When we first learn a new concept, we have a ton of uncertainty about its location in thingspace. As we collect more data (either through direct observation or indirectly through communication with other humans), we are able to decrease that uncertainty, but it never goes away completely. An AI which uses human concepts will have to be able to deal with concept-uncertainty and the complications that arise as a result.
The fact that humans can’t always agree with each other on what constitutes porn vs. erotica demonstrates that we don’t all carve reality up in the same places (and therefore there’s no “objective” definition of porn). The fact that individual humans often have trouble classifying edge cases demonstrates that even when you look at a single person’s concept, it will still contain some uncertainty. The more we discuss and negotiate the meanings of concepts, the less fuzzy the boundaries will become, but we can’t remove the fuzziness completely. We can write out a legal definition of porn, but it won’t necessarily correspond to the black-box classifiers that real people are using. And concepts change—what we think of as porn might be classified differently in 100 years. An AI can’t just find a single carving of reality and stick with it; the AI needs to adapt its knowledge as the concepts mutate.
So I’m pretty sure that what you’re asking is impossible. The concept-boundaries in thingspace remain fuzzy until humans negotiate them by discussing specific edge cases. (And even then, they are still fuzzy, just slightly less so.) So there’s no way to find the concept boundaries without asking people about it; it’s the interaction between human decision makers that define the concept in the first place.
Related paper. Also sections 1 and 2 of this paper.