Wei Dai comments on Open & Welcome Thread—February 2020

Wei Dai 8 Feb 2020 10:05 UTC
4 points
Sorry for the delayed reply, but I was confused by your comment and have been trying to figure out how to respond. Still not sure I understand but I’m going to take a shot.

someone who follows TV shows so that they have something to talk about with their coworkers is going to just follow whatever shows their coworkers are interested in, because they’re just using it as an investment vehicle instead of something to be pursued in its own right.

Watching a TV show in order to talk about it with coworkers is an instance of instrumental preferences (which I didn’t talk about specifically in my model but was implicitly assuming as a background concept). When I wrote “preference alteration” I was referring to terminal preferences/values. So if you switch what show you watch in order to match your coworkers’ interests (and would stop as soon as that instrumental value went away), that’s not covered by either “preference alteration” or “preference falsification”, but just standard instrumental preferences. However if you’re also claiming to like the show when you don’t, in order to fit in, then that would be covered under “preference falsification”.

Does this indicate a correct understanding of your comment, and does it address your point? If so, it doesn’t seem like the model is missing anything (“incomplete”), except I could perhaps add an explicit explanation of instrumental preferences and clarify that “preference alteration” is talking about terminal preferences. Do you agree?

It also seems like the factions changing directions is quite important here; you might not change the total budget spent on global altruism at all while taking totally different actions (i.e. donating to different charities).

Sure, this is totally compatible with my model and I didn’t intend to suggest otherwise.
- Vaniver 9 Feb 2020 21:53 UTC
  6 points
  Parent
  Does this indicate a correct understanding of your comment, and does it address your point?
  I think the core thing going on with my comment is that I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
  Under this view, preference alteration is part of normal operation, and so should probably be cast as a special case of the general thing, instead of existing only in this context. When someone who initially dislikes the smell of coffee grows to like it, I don’t think it’s directly because it’s cognitively costly to keep two books, and instead it’s because they have some anticipation-generating machinery that goes from anticipating bad things about coffee to anticipating good things about coffee.
  [It is indirectly about cognitive costs, in that if it were free you might store all your judgments ever, but from a functional perspective downweighting obsolete beliefs isn’t that different from forgetting them.]
  And so it seems like there are three cases worth considering: given a norm that people should root for the sports team where they grew up, I can either 1) privately prefer Other team while publicly rooting for Local team, 2) publicly prefer Local team in order to not have to lie to myself, or 3) publicly prefer Local team for some other reason. (Maybe I trust the thing that generated the norm is wiser than I am, or whatever.)
  Maybe another way to think about this how the agent relates to the social reward gradient; if it’s just a fact of the environment, then it makes sense to learn about it the way you would learn about coffee, whereas if it’s another agent influencing you as you influence it, then it makes sense to keep separate books, and only not do so when the expected costs outweigh the expected rewards.
  - TurnTrout 10 Feb 2020 15:27 UTC
    2 points
    Parent
    
    I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
    
    I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.