Perplexed comments on Best career models for doing research?

Perplexed Jan 22, 2011, 4:32 PM
3 points

To see how resource limitation leads to temporal discounting, consider computer chess.

Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.

Instead, you need to look at questions like this: An agent plays go in a coffee shop. He has the choice of playing slowly, in which case the games each take an hour and he wins 70% of them. Or, he can play quickly, in which case the games each take 20 minutes, but he only wins 60% of them. As soon as one game finishes, another begins. The agent plans to keep playing go forever. He gains 1 util each time he wins and loses 1 util each time he loses.

The main decision he faces is whether he maximizes utility by playing slowly or quickly. Of course, he has infinite expected utility however he plays. You can redefine the objective to be maximizing utility flow per hour and still get a ‘rational’ solution. But this trick isn’t enough for the following extended problem:

The local professional offers go lessons. Lessons require a week of time away from the coffee-shop and a 50 util payment. But each week of lessons turns 1% of your losses into victories. Now the question is: Is it worth it to take lessons? How many weeks of lessons are optimal? The difficulty here is that we need to compare the values of a one-shot (50 utils plus a week not playing go) with the value of an eternal continuous flow (the extra fraction of games per hour which are victories rather than losses). But that is an infinite utility payoff from the lessons, and only a finite cost, right? Obviously, the right decision is to take a week of lessons. And then another week after that. And so on. Forever.

Discounting of future utility flows is the standard and obvious way of avoiding this kind of problem and paradox. But now let us see whether we can alter this example to capture your ‘instrumental discounting due to an uncertain future’:

First, the obvious one. Our hero expects to die someday, but doesn’t know when. He estimates a 5% chance of death every year. If he is lucky, he could live for another century. Or he could keel over tomorrow. And when he dies, the flow of utility from playing go ceases. It is very well known that this kind of uncertainty about the future is mathematically equivalent to discounted utility in a certain future. But you seemed to be suggesting something more like the following:

Our hero is no longer certain what his winning percentage will be in the future. He knows that he experiences microstrokes roughly every 6 months, and that each incident takes 5% of his wins and changes them to losses. On the other hand, he also knows that roughly every year he experiences a conceptual breakthrough. And that each such breakthrough takes 10% of his losses and turns them into victories.

Does this kind of uncertainty about the future justify discounting on ‘instrumental grounds’? My intuition says ’No, not in this case, but there are similar cases in which discounting would work.” I haven’t actually done the math, though, so I remain open to instruction.
- timtyler Jan 22, 2011, 8:07 PM
  1 point
  Parent
  
  Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
  
  Temporal discounting is about valuing something happening today more than the same thing happening tomorrow.
  
  Chess computers do, in fact discount. That is why they do prefer to mate you in twenty moves rather than a hundred.
  
  The values of a chess computer do not just tell it to win. In fact, they are complex—e.g. Deep Blue had an evaluation function that was split into 8,000 parts.
  
  Operation consists of maximising the utility function, after foresight and tree pruning. Events that take place in branches after tree pruning has truncated them typically don’t get valued at all—since they are not forseen. Resource-limited chess computers can find themselves preferring to promote a pawn sooner rather than later. They do so since they fail to see the benefit of sequences leading to promotion later.
- timtyler Jan 23, 2011, 12:56 PM
  0 points
  Parent
  So: we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it) - but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
  
  Hmm. It seems as though this has turned out to be a rather esoteric technical question about exactly which set of phenomena the term “temporal discounting” can be used to refer to.
  
  Earlier we were talking about whether agents focussed their attention on tomorrow—rather than next year. Putting aside the issue of whether that is classified as being “temporal discounting”—or not—I think the extent to which agents focus on the near-future is partly a consequence of resource limitation. Give the agents greater abilities and more resources and they become more future-oriented.
  - Perplexed Jan 23, 2011, 3:26 PM
    −2 points
    Parent
    
    we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it)
    
    No, I have not agreed to that. I disagree with almost every part of it.
    
    In particular, I think that the question of whether (and how much) one cares about the future is completely prior to questions about deciding how to act so as to maximize the things one cares about. In fact, I thought you were emphatically making exactly this point on another branch.
    
    But that is fundamental ‘indifference’ (which I thought we had agreed cannot flow from instrumental considerations). I suppose you must be talking about some kind of instrumental or ‘derived’ indifference. But I still disagree. One does not derive indifference from not bothering to predict—one instead derives not bothering to predict from being indifferent.
    
    Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
    
    but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
    
    And finally, regarding classification, you seem to suggest that you view truncation of the future as just one form of discounting, whereas I choose not to. And that this makes our disagreement a quibble over semantics. To which I can only reply: Please go away Tim.
    - timtyler Jan 23, 2011, 7:44 PM
      0 points
      Parent
      
      Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
      
      I think you would reduce how far you look forward if you were interested in using your resources intelligently and efficiently.
      
      If you only have a million cycles per second, you can’t realistically go 150 ply deep into your go game—no matter how much you care about the results after 150 moves. You compromise—limiting both depth and breadth. The reduction in depth inevitably means that you don’t look so far into the future.
      - Perplexed Jan 23, 2011, 8:13 PM
        2 points
        Parent
        A lot of our communication difficulty arises from using different models to guide our intuitions. You keep imagining game-tree evaluation in a game with perfect information (like chess or go). Yes, I understand your point that in this kind of problem, resource shortages are the only cause of uncertainty—that given infinite resources, there is no uncertainty.
        
        I keep imagining problems in which probability is built in, like the coffee-shop-go-player which I sketched recently. In the basic problem, there is no difficulty in computing expected utilities deeper into the future—you solve analytically and then plug in whatever value for t that you want. Even in the more difficult case (with the microstrokes) you can probably come up with an analytic solution. My models just don’t have the property that uncertainty about the future arises from difficulty of computation.
        timtyler Jan 23, 2011, 8:40 PM
        0 points
        Parent
        Right. The real world surely contains problems of both sorts. If you have a problem which is dominated by chaos based on quantum events then more resources won’t help. Whereas with many other types of problems more resources do help.
        
        I recognise the existence of problems where more resources don’t help—I figure you probably recognise that there are problems where more resources do help—e.g. the ones we want intelligent machines to help us with.
        Perplexed Jan 23, 2011, 9:23 PM
        1 point
        Parent
        
        The real world surely contains problems of both sorts.
        
        Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
        
        The most charitable interpretation I can make of your argument here is that human agents, being resource limited, imagine that they discount the future. That discounting is a heuristic introduced by evolution to compensate for those resource limitations. I also charitably assume that you are under the misapprehension that if I only understood the argument, I would agree with it. Because if you really realized that I have already heard you, you would stop repeating yourself.
        
        That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
        
        I repeat, Tim. Please go elsewhere.
        timtyler Jan 23, 2011, 11:58 PM
        2 points
        Parent
        
        That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
        
        I am pretty sure that I just told you that I do not think that all discounting is instrumental. Here’s what I said:
        
        I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental.
        
        To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
        
        Agents can have many kinds of utility function! That is partly a consequence of there being so many different ways for agents to go wrong.
        Perplexed Jan 24, 2011, 1:43 AM
        0 points
        Parent
        Thx for the correction. It appears I need to strengthen my claim.
        
        Not all discounting by rational, moral agents is instrumental.
        
        Are we back in disagreement now? :)
        timtyler Jan 24, 2011, 10:40 AM
        2 points
        Parent
        No, we aren’t. In my book:
        
        Being rational isn’t about your values, you can rationally pursue practially any goal. Epistemic rationality is a bit different—but I mosly ignore that as being unbiological.
        
        Being moral isn’t really much of a constraint at all. Morality—and right and wrong—are normally with respect to a moral system—and unless a moral system is clearly specified, you can often argue all day about what is moral and what isn’t. Maybe some types of morality are more common than others—due to being favoured by the universe, or something like that—but any such context would need to be made plain in the discussion.
        
        So, it seems (relatively) easy to make a temporal discounting agent that really values the present over the future—just stick a term for that in its ultimate values.
        
        Are there any animals with ultimate temporal discounting? That is tricky, but it isn’t difficult to imagine natural selection hacking together animals that way. So: probably, yes.
        
        Do I use ultimate temporal discounting? Not noticably—as far as I can tell. I care about the present more than the future, but my temporal discounting all looks instrumental to me. I don’t go in much for thinking about saving distant galaxies, though! I hope that further clarifies.
        
        I should probably review around about now. Instead of that: IIRC, you want to wire temporal discounting into machines, so their preferences better match your own—whereas I tend to think that would be giving them your own nasty hangover.
        timtyler Jan 24, 2011, 12:18 AM
        0 points
        Parent
        
        Please go away Tim. [...] I repeat, Tim. Please go elsewhere.
        
        If you are not valuing my responses, I recommend you stop replying to them—thereby ending the discussion.
        timtyler Jan 24, 2011, 12:05 AM
        0 points
        Parent
        
        The real world surely contains problems of both sorts.
        
        Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
        
        Programs make good models. If you can program it, you have a model of it. We can actually program agents that make resource-limited decisions. Having an actual program that makes decisions is a pretty good way of modeling making resource-limited decisions.
- timtyler Jan 22, 2011, 8:35 PM
  0 points
  Parent
  Perhaps we have some kind of underlying disagreement about what it means for temporal discounting to be “instrumental”.
  
  In your example of an agent with suffering from risk of death, my thinking is: this player might opt for a safer life—with reduced risk. Or they might choose to lead a more interesting but more risky life. Their degree of discounting may well adjust itself accordingly—and if so, I would take that as evidence that their discounting was not really part of their pure preferences, but rather was an instrumental and dynamic response to the observed risk of dying.
  
  If—on the other hand—they adjusted the risk level of their lifestyle, and their level of temporal discounting remained unchanged, that would be cofirming evidence in favour of the hypothesis that their temporal discounting was an innate part of their ultimate preferences—and not instrumental.
  - Will_Sawin Feb 2, 2011, 7:01 AM
    0 points
    Parent
    This bothers me since, with reasonable assumptions, all rational agents engage in the same amount of catastrophe discounting.
    
    That is, observed discount rate = instrumental discount rate + chance of death + other factors
    
    We should expect everyone’s discount rate to change, by the same amount, unless they’re irrational.
    - timtyler Feb 2, 2011, 9:29 AM
      0 points
      Parent
      Agents do not all face the same risks, though.
      
      Sure, they may discount the same amount if they do face the same risks, but often they don’t—e.g. compare the motorcycle racer with the nun.
      
      So: the discounting rate is not fixed at so-much per year, but rather is a function of the agent’s observed state and capabilities.
      - Will_Sawin Feb 2, 2011, 4:13 PM
        0 points
        Parent
        Of course. My point is that observing if the discount rate changes with the risk tells you if the agent is rational or irrational, not if the discount rate is all instrumental or partially terminal.
        timtyler Feb 2, 2011, 5:38 PM
        0 points
        Parent
        Stepping back for a moment, terminal values represent what the agent really wants, and instrumental values are things sought en-route.
        
        The idea I was trying to express was: if what an agent really wants is not temporally discounted, then instrumental temporal discounting will produce a predictable temporal discounting curve—caused by aging, mortality risk, uncertainty, etc.
        
        Deviations from that curve would indicate the presence of terminal temporal discounting.
        Will_Sawin Feb 2, 2011, 8:07 PM
        0 points
        Parent
        Agreed.
  - Perplexed Jan 22, 2011, 9:58 PM
    0 points
    Parent
    I have no disagreement at all with your analysis here. This is not fundamental discounting. And if you have decision alternatives which affect the chances of dying, then it doesn’t even work to model it as if it were fundamental.