In terms of AI, this is equivalent with “value loading”: refining the AI’s values through interactions with human decision makers, who answer questions about edge cases and examples and serve as “learned judges” for the AI’s concepts. But suppose that approach was not available to you
But it is, and the contrary approach of teaching humans to recognize things doesn’t have an obvious relation to FAI, unless we think that the details of teaching human brains by instruction and example are relevant to how you’d set up a similar training program for an unspecified AI algorithm. If this is the purported connection to FAI it should be spelled out explicitly, and the possible failure of the connection spelled out explicitly. I’m also not sure the example is a good one for the domain. Asking how to distinguish happiness from pleasure, what people really want from what they say they want, the difference between panic and justified fear? Or maybe if we want to start with something more object-level, what should be tested and when you should draw confident conclusions about what someone’s taste buds will like (under various circumstances?), i.e., how much do you need to know to decide that someone will like the taste of a cinnamon candy if they’ve never tried anything cinnamon before? Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking—if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?
I like the “What does it take to predict taste buds?” question, of those I brainstormed above, because it’s something we could conceivably test in practice. Or maybe an even more practical conjugate would be Netflix-style movie score prediction, only you can ask the subject whatever you like, have them rate particular other movies, etc., all to predict the rating on that one movie.
Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking—if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?
Well, conflicting values is obviously relevant, and disagreements seem so as well to a less extend (consider the problem of choosing priors for an AI), for starters.
The “predict taste buds?” question is better. But even that one contains feedback cycles over tastes. At least on some domains like wine and probably cigarettes and expensive socially consumed goods.
But it is, and the contrary approach of teaching humans to recognize things doesn’t have an obvious relation to FAI, unless we think that the details of teaching human brains by instruction and example are relevant to how you’d set up a similar training program for an unspecified AI algorithm. If this is the purported connection to FAI it should be spelled out explicitly, and the possible failure of the connection spelled out explicitly. I’m also not sure the example is a good one for the domain. Asking how to distinguish happiness from pleasure, what people really want from what they say they want, the difference between panic and justified fear? Or maybe if we want to start with something more object-level, what should be tested and when you should draw confident conclusions about what someone’s taste buds will like (under various circumstances?), i.e., how much do you need to know to decide that someone will like the taste of a cinnamon candy if they’ve never tried anything cinnamon before? Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking—if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?
I like the “What does it take to predict taste buds?” question, of those I brainstormed above, because it’s something we could conceivably test in practice. Or maybe an even more practical conjugate would be Netflix-style movie score prediction, only you can ask the subject whatever you like, have them rate particular other movies, etc., all to predict the rating on that one movie.
Well, conflicting values is obviously relevant, and disagreements seem so as well to a less extend (consider the problem of choosing priors for an AI), for starters.
I’m just fishing in random seas for new ideas.
The movie prediction question is complicated because it includes feedbacl cycles over styles and tastes and is probably cross-linked to other moves airing at the same time. See e.g. http://www.stat.berkeley.edu/~aldous/157/Old_Projects/kennedy.pdf
The “predict taste buds?” question is better. But even that one contains feedback cycles over tastes. At least on some domains like wine and probably cigarettes and expensive socially consumed goods.