Having preferences be totally consistent across time would require a mechanism, by default physical locality properties would imply inconsistent time preference
So I’m wondering what mechanisms there are or could be for coherentizing short- and long-term preferences, given that they’re prima facie in conflict. And, how could such a mechanism not involve preference falsification. I mean, it seems much less bad to do something for long-term (individual intrinsic) value that harms short-term (individual intrinsic) preferences, or even vice versa, than to do something for no reason that harms all preferences, or to do something for externally imposed reasons, which would play into dynamics that result in doing things for no reason.
More to your point, it seems plausible to negotiate between preferences, internally to an individual, in a way that doesn’t require falsification, and instead works by weighing tradeoffs. But I don’t have a clear picture of exactly why internal negotiation requires less falsification than external pressure; is it mainly a quantitative difference between amount of privacy between parts (hence potential for fraud), or ease of communication; or is it something to do with individuals having some integrated criteria of judgement; or what? (Also, can the intraindividual structure be ported to the interindividual realm, etc.) I’m also wondering, if the way individuals coherentize does involve some preference falsification even apart from external pressure (for example, sour grapes?), does that have the same problems as you discuss? If someone has strictly partial preference falsification from external pressure, will they not be able to make FAI (given that they could chain from some coherent goal, even if it’s not the whole of their values, and therefore build up coherent understanding)?
It seems like some forms of reinforcement learning do some forms of coherentizing short-term and long-term preferences; there can be a short-term reward associated with a prediction of future reward, e.g. happiness upon having successfully negotiated to buy a house, which is a prediction of future reward. It seems pretty common for “instrumental” goods like money to be associated with short-term hedonic reward.
The way it would not involve preference falsification is if it is clear whether something is being done for short-term or long-term benefit, and short-term benefits aren’t totally getting overwritten with long-term benefits. Similar to Eliezer’s point about the drowning child except extending across time instead of space.
But I don’t have a clear picture of exactly why internal negotiation requires less falsification than external pressure
There are 2 layers where there could be falsification: internal and external. For external we can see the mechanisms better, it’s possible for two different people to perceive the same facts about the society they live in, in a way that’s harder for mental facts. So that seems like a more natural place to start correcting the errors, although correcting internal errors is also necessary to some degree, and will use some tools in common with correcting external errors.
Incoherence of a person across time is often related to that person being externally influenced, e.g. trying to comply with whoever they’re talking with at the time and therefore expressing different values at different times.
So I’m wondering what mechanisms there are or could be for coherentizing short- and long-term preferences, given that they’re prima facie in conflict. And, how could such a mechanism not involve preference falsification. I mean, it seems much less bad to do something for long-term (individual intrinsic) value that harms short-term (individual intrinsic) preferences, or even vice versa, than to do something for no reason that harms all preferences, or to do something for externally imposed reasons, which would play into dynamics that result in doing things for no reason.
More to your point, it seems plausible to negotiate between preferences, internally to an individual, in a way that doesn’t require falsification, and instead works by weighing tradeoffs. But I don’t have a clear picture of exactly why internal negotiation requires less falsification than external pressure; is it mainly a quantitative difference between amount of privacy between parts (hence potential for fraud), or ease of communication; or is it something to do with individuals having some integrated criteria of judgement; or what? (Also, can the intraindividual structure be ported to the interindividual realm, etc.) I’m also wondering, if the way individuals coherentize does involve some preference falsification even apart from external pressure (for example, sour grapes?), does that have the same problems as you discuss? If someone has strictly partial preference falsification from external pressure, will they not be able to make FAI (given that they could chain from some coherent goal, even if it’s not the whole of their values, and therefore build up coherent understanding)?
It seems like some forms of reinforcement learning do some forms of coherentizing short-term and long-term preferences; there can be a short-term reward associated with a prediction of future reward, e.g. happiness upon having successfully negotiated to buy a house, which is a prediction of future reward. It seems pretty common for “instrumental” goods like money to be associated with short-term hedonic reward.
The way it would not involve preference falsification is if it is clear whether something is being done for short-term or long-term benefit, and short-term benefits aren’t totally getting overwritten with long-term benefits. Similar to Eliezer’s point about the drowning child except extending across time instead of space.
There are 2 layers where there could be falsification: internal and external. For external we can see the mechanisms better, it’s possible for two different people to perceive the same facts about the society they live in, in a way that’s harder for mental facts. So that seems like a more natural place to start correcting the errors, although correcting internal errors is also necessary to some degree, and will use some tools in common with correcting external errors.
Incoherence of a person across time is often related to that person being externally influenced, e.g. trying to comply with whoever they’re talking with at the time and therefore expressing different values at different times.