While the outline is nice, I think this is the wrong place to start. Instead, we should start by answering some more basic questions (not necessarily just these, all of these, or in this order).
What do we plan to do with the concept of “values”? Keeping the previous question in mind, what are values, and do we intend to use the same concept for AI and human “values”? Do humans, in fact, have values? If they do, are these values internally consistent, or do they show order-dependence (or some other “problem”) somewhere? Are they consistent across time and inputs in some respect?
While these questions are also hard to answer, there’s actually a good chance they have answers multiple people can agree on, and answering them should hopefully gives us the ability to your original questions as well.
While the outline is nice, I think this is the wrong place to start. Instead, we should start by answering some more basic questions (not necessarily just these, all of these, or in this order).
What do we plan to do with the concept of “values”? Keeping the previous question in mind, what are values, and do we intend to use the same concept for AI and human “values”? Do humans, in fact, have values? If they do, are these values internally consistent, or do they show order-dependence (or some other “problem”) somewhere? Are they consistent across time and inputs in some respect?
While these questions are also hard to answer, there’s actually a good chance they have answers multiple people can agree on, and answering them should hopefully gives us the ability to your original questions as well.