Pretty accurate description of the mostly-the-same attempt. Also agreed that reverse-engineering your labelers is hard.
I think that for the other examples you would then need to do the more dangerous alternative and try and figure out what about your original concept you valued. It seems like you can do this with a mix of built-in hardware (if you’re a human), and trying to come up with explanations about what would cause you to value something.
Like, for physical contiguity I value the fact that if I interact with a “person” then the result of that interaction will be causally relevant to them at some later time. That’s very important if I want to like, interact with people in some way that I’ll care about having done later. It would suck to make lunch plans with someone, and then have that be completely irrelevant to their later behavior/memory.
I also think that it’s worth mentioning that I don’t think that humans are always valuing things because they have a conceptual framework that implies they should. I’m not saying that ontologies change, but it doesn’t feel like most updates restructure the concepts that something is expressed in. For instance I used to be pretty pro-abortion, but recently found out that, emotionally at least, I find it very upsetting for reasons mostly unrelated to my previous justifications.
Pretty accurate description of the mostly-the-same attempt. Also agreed that reverse-engineering your labelers is hard.
I think that for the other examples you would then need to do the more dangerous alternative and try and figure out what about your original concept you valued. It seems like you can do this with a mix of built-in hardware (if you’re a human), and trying to come up with explanations about what would cause you to value something.
Like, for physical contiguity I value the fact that if I interact with a “person” then the result of that interaction will be causally relevant to them at some later time. That’s very important if I want to like, interact with people in some way that I’ll care about having done later. It would suck to make lunch plans with someone, and then have that be completely irrelevant to their later behavior/memory.
I also think that it’s worth mentioning that I don’t think that humans are always valuing things because they have a conceptual framework that implies they should. I’m not saying that ontologies change, but it doesn’t feel like most updates restructure the concepts that something is expressed in. For instance I used to be pretty pro-abortion, but recently found out that, emotionally at least, I find it very upsetting for reasons mostly unrelated to my previous justifications.