So basically: Redomain your utility function by composing it with an adaptor. Where the adaptor is a map from new-ontology → old-ontology. Construct the adaptor by reverse-engineering your algorithms. Have I got that right?
Edit: No this sucks. Sometimes the old ontology doesn’t make sense. I must think more. /Edit
That’s a good statement of the problem, but I can see that “reverse engineer your algorithms” is the hard part, and we’ve just bottled it up as a black box. There’s no obvious way to deal with cases that couldn’t exist in your old ontology (brain damage can’t exist in a simple dualist ontology, for example), or cases where there’s a disagreement (teleportation and destructive-scan + print are different when things are ontologically basic, but more advanced physics says they are the same).
Some help may come from the fact that we seem to have some builtin support for ontology-shifting. It does happen successfully, though perhaps not always without loss. On the other hand people with the same ontology don’t seem to diverge much by getting their through different update-chains.
Pretty accurate description of the mostly-the-same attempt. Also agreed that reverse-engineering your labelers is hard.
I think that for the other examples you would then need to do the more dangerous alternative and try and figure out what about your original concept you valued. It seems like you can do this with a mix of built-in hardware (if you’re a human), and trying to come up with explanations about what would cause you to value something.
Like, for physical contiguity I value the fact that if I interact with a “person” then the result of that interaction will be causally relevant to them at some later time. That’s very important if I want to like, interact with people in some way that I’ll care about having done later. It would suck to make lunch plans with someone, and then have that be completely irrelevant to their later behavior/memory.
I also think that it’s worth mentioning that I don’t think that humans are always valuing things because they have a conceptual framework that implies they should. I’m not saying that ontologies change, but it doesn’t feel like most updates restructure the concepts that something is expressed in. For instance I used to be pretty pro-abortion, but recently found out that, emotionally at least, I find it very upsetting for reasons mostly unrelated to my previous justifications.
So basically: Redomain your utility function by composing it with an adaptor. Where the adaptor is a map from new-ontology → old-ontology. Construct the adaptor by reverse-engineering your algorithms. Have I got that right?
Edit: No this sucks. Sometimes the old ontology doesn’t make sense. I must think more. /Edit
That’s a good statement of the problem, but I can see that “reverse engineer your algorithms” is the hard part, and we’ve just bottled it up as a black box. There’s no obvious way to deal with cases that couldn’t exist in your old ontology (brain damage can’t exist in a simple dualist ontology, for example), or cases where there’s a disagreement (teleportation and destructive-scan + print are different when things are ontologically basic, but more advanced physics says they are the same).
Some help may come from the fact that we seem to have some builtin support for ontology-shifting. It does happen successfully, though perhaps not always without loss. On the other hand people with the same ontology don’t seem to diverge much by getting their through different update-chains.
Pretty accurate description of the mostly-the-same attempt. Also agreed that reverse-engineering your labelers is hard.
I think that for the other examples you would then need to do the more dangerous alternative and try and figure out what about your original concept you valued. It seems like you can do this with a mix of built-in hardware (if you’re a human), and trying to come up with explanations about what would cause you to value something.
Like, for physical contiguity I value the fact that if I interact with a “person” then the result of that interaction will be causally relevant to them at some later time. That’s very important if I want to like, interact with people in some way that I’ll care about having done later. It would suck to make lunch plans with someone, and then have that be completely irrelevant to their later behavior/memory.
I also think that it’s worth mentioning that I don’t think that humans are always valuing things because they have a conceptual framework that implies they should. I’m not saying that ontologies change, but it doesn’t feel like most updates restructure the concepts that something is expressed in. For instance I used to be pretty pro-abortion, but recently found out that, emotionally at least, I find it very upsetting for reasons mostly unrelated to my previous justifications.
Not necessarily. Compare the positions of Multiheaded and yourself/Konkvistador w.r.t Moldbuggery.
I wonder what the different chains are.