There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming.
While there is some regular structure to human values, I don’t think you can say that the totality of human values has a completely regular structure. There are too many cases of nameless longings and generalized anxieties. Much of art is dedicated exactly to teasing out these feelings and experiences, often in counterintuitive contexts.
Can they be learned without detailed knowledge of X, Y and Z? I suppose it depends on what “detailed” means—I’ll assume it means “less detailed than the required knowledge of the structure of human values.” That said, the excluded set of knowledge you chose—“physics, anatomy, or AI programming”—seems really odd to me. I suppose you can poll people about their values (or use more sophisticated methods like prediction markets), but I don’t see how this can yield more than “the set of human values that humans can articulate.” It’s something, but this seems to be a small subset of the set of human values. To characterize all dimensions of human values, I do imagine that you’ll need to model human neural biophysics in detail. If successful, it will be a contribution to AI theory and practice.
Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.
To me, in this context, the term “fragile” means exactly that it is important to characterize and consider all dimensions of human values, as well as the potentially highly nonlinear relationships between those dimensions. An at-the-time invisible “blow” to at-the-time unarticulated dimension can result in unfathomable suffering 1000 years hence. Can a human intelligence capture the totality of human values? Some of our artists seem to have glimpses of the whole, but it seems unlikely to me that a baseline human can appreciate the whole clearly.
While there is some regular structure to human values, I don’t think you can say that the totality of human values has a completely regular structure. There are too many cases of nameless longings and generalized anxieties. Much of art is dedicated exactly to teasing out these feelings and experiences, often in counterintuitive contexts.
Can they be learned without detailed knowledge of X, Y and Z? I suppose it depends on what “detailed” means—I’ll assume it means “less detailed than the required knowledge of the structure of human values.” That said, the excluded set of knowledge you chose—“physics, anatomy, or AI programming”—seems really odd to me. I suppose you can poll people about their values (or use more sophisticated methods like prediction markets), but I don’t see how this can yield more than “the set of human values that humans can articulate.” It’s something, but this seems to be a small subset of the set of human values. To characterize all dimensions of human values, I do imagine that you’ll need to model human neural biophysics in detail. If successful, it will be a contribution to AI theory and practice.
To me, in this context, the term “fragile” means exactly that it is important to characterize and consider all dimensions of human values, as well as the potentially highly nonlinear relationships between those dimensions. An at-the-time invisible “blow” to at-the-time unarticulated dimension can result in unfathomable suffering 1000 years hence. Can a human intelligence capture the totality of human values? Some of our artists seem to have glimpses of the whole, but it seems unlikely to me that a baseline human can appreciate the whole clearly.