Psy-Kosh comments on How Not to be Stupid: Adorable Maybes

Psy-Kosh 29 Apr 2009 22:32 UTC
0 points
“buying” in the sense of “assuming you consider the argument valid” but actually, I’ve rethought about it several times and I think you’re right about that. I think I’m going to edit that bit somewhat in light of that.

Do you accept that IF for some agent it can be said that for any two states, they prefer one to the other or are indifferent (ie, have just as much preference for one as for the other) THEN, that combined by the “don’t be stupid” rule, prohibits cycles in the preference rankings?
- cousin_it 29 Apr 2009 23:03 UTC
  1 point
  Parent
  Yes for idealized agents. Not yet convinced about humans.
  
  See, if your theory eventually runs counter to common sense on Pascal’s Mugging (Eliezer says he has no good solution, common sense says decline the offer) or Dust Specks (Eliezer chooses torture, common sense chooses dust specks), we will have to reexamine the assumptions again. It could easily be that the utility function assumption is faulty, or well-orderedness is faulty, or something else.
  - Psy-Kosh 29 Apr 2009 23:25 UTC
    1 point
    Parent
    Actually, IIRC, Eliezer said that he thinks Robin Hanson’s (I think it was his) solution to the mugging seems to be in the right direction. But that gets into computational power issues. Actually, my original intent was to name this sequence “How not to Be Stupid (given unbounded computational power)”
    
    Obviously we can’t do the full decision theory computations in full exact correctness. And I did give the warning against hastily giving an oversimplified human preference generator. What I’m going for here is more “why assume that Bayesian decision theory is the thing we should be building approximations to, rather than some other entirely different blob of math?”
    
    (Oh, incidentally. I originally chose SPECKS, then later one of the comments in that sequence of posts (the comment that stepped through it, incrementally reducing, etc) ended up convincing me to switch to TORTURE.)
    
    Also, finished editing the offending argument.
    - cousin_it 29 Apr 2009 23:57 UTC
      1 point
      Parent
      
      What I’m going for here is more “why assume that Bayesian decision theory is the thing we should be building approximations to, rather than some other entirely different blob of math?”
      
      Over the last couple years I went from believing that statement to deeply doubting it. If you want a chess player that will win games by holding the opponents’ kids hostage, sure, build a Bayesian optimizer. My personal feeling is that even an ordinary human modified to be deeply and genuinely driven by an explicit utility function would pose a substantial danger to this world. No need for AIs.
      - Psy-Kosh 30 Apr 2009 0:03 UTC
        3 points
        Parent
        That’s where the whole “don’t assume an overly simplistic preference ranking for yourself” warnings come in.
        
        ie, nothing wrong with the utility function being composed of terms for all the things we value, and simply happening to include for that player a component that translates to “win at chess by actually playing chess”, and other components giving stuff that lowers utility for “kids have been kidnapped” situations, etc etc etc.
        
        The hard part is, of course, actually translating the algorithms we’re running (including the bits that respond to arguments that lead us to become convinced to change our minds about a moral question, etc etc) into a more explicit algorithm. Any simple one is going to get it WRONG.
        
        But that’s not a hit against decision theory. That’s a hit against bad utility functions.
        
        Or did I utterly misunderstand your point?
        cousin_it 30 Apr 2009 0:15 UTC
        3 points
        Parent
        
        But that’s not a hit against decision theory. That’s a hit against bad utility functions.
        
        We know from Eliezer’s writings that almost any strong goal-directed chessplayer AI will destroy the world. Well guess what, if a non-world-destroying utility function appears almost impossibly hard to formulate, in my book it counts as a hit against the concept of utility functions. Especially seeing as machines based on e.g. control theory (RichardKennaway) behave much more sensibly—they almost never display any urge to screw up the whole world, instead being content to sit there and tweak their needle.
        Psy-Kosh 30 Apr 2009 0:26 UTC
        5 points
        Parent
        Well, a recursively self modifying chess playing AI is a very different beast than a human who, AMONG OTHER THINGS, cares about doing well at chess. The sum total of those other things and chess together is a very different goal system than “chess and nothing else”.
        
        As far as control theory, well… that’s because control theory based systems are currently too stupid to pose such a threat to us, no?
        
        Your judgment against decision theory seems to be “an agent based on it will act in accordance with its utility function… which may not resemble meaningfully my preferences. It may not be moral, etc etc etc. It will be good at what it’s trying to do… but it isn’t exactly trying to do the stuff I care about.”
        
        Do you consider this a fair summary of your position?
        
        If so, then the response is, well… So, it’s good at doing the stuff it’s trying to do. It’s not trying to do what we’d prefer it to be doing. This is a serious problem. But that problem isn’t a flaw with decision theory itself. I mean, if decision theory is leading it to be good optimizing reality in accordance with its preference rankings, then decision theory is acting as promised. The problem is “it’s trying to do stuff we don’t want it to do!”
        
        The things we care abut are complicated. To actually specifically accurately fully and explicitly specify that is REALLY HARD. That doesn’t mean decision theory is inherently flawed. It means, well, fully specifying what we actually want is a highly nontrivial problem.
        cousin_it 30 Apr 2009 7:51 UTC
        2 points
        Parent
        I agree with you that the math is right. Given assumptions, it acts as promised. But the assumptions just aren’t a good model of reality. Like naive game theory: you can go with the mathematically justified option of Always Defect, or you can go with common sense. Reality doesn’t contain preference rankings over all possible situations; shoehorning reality into preference rankings might hurt you. Hasn’t this point clicked yet? I’ll try again.
        
        The sum total of those other things and chess together is a very different goal system than “chess and nothing else”.
        
        Human beings aren’t goal systems. We DON’T SUM, anymore than a car “sums” the value of its speedometer with the value of the fuel gauge. If we actually summed, you’d get the outcome Eliezer once advocated: every one of us “picking one charity and donating as much to it as he can”. Your superintelligent chess player with the “correct” utility function won’t ever play chess while there are other util-rich tasks anywhere in the world, like hunger in Africa.
        
        That doesn’t mean decision theory is inherently flawed. It means, well, fully specifying what we actually want is a highly nontrivial problem.
        
        We shouldn’t need to fully specify what we actually want, if we’re building a specialized machine to e.g. cure world hunger or design better integrated circuits. It would be better to build such machines based on a theory that typically results in localized screw-ups… rather than a theory that destroys the world by default, unless you tell it everything about you.
        loqi 1 May 2009 2:21 UTC
        1 point
        Parent
        
        We shouldn’t need to fully specify what we actually want, if we’re building a specialized machine to e.g. cure world hunger or design better integrated circuits.
        
        What if we’re building a specialized machine to prevent a superintelligence from annihilating us?
        thomblake 30 Apr 2009 13:11 UTC
        1 point
        Parent
        
        It would be better to build such machines based on a theory that typically results in localized screw-ups… rather than a theory that destroys the world by default, unless you tell it everything about you.
        
        Where’s the “I super-agree” button?
        
        I agree with you that maximizing utility is dangerous and wrong even just in ordinary humans. That’s not what we’re for and that’s not what the good life is about.
        
        We don’t need a clean-cut, provable decision theory that will drive the universe into a hole of ‘utility’. We need more of a wibbly-wobbly, humany-ethicy ball of… stuff.
        mattnewport 30 Apr 2009 18:00 UTC
        0 points
        Parent
        
        Human beings aren’t goal systems. We DON’T SUM, anymore than a car “sums” the value of its speedometer with the value of the fuel gauge. If we actually summed, you’d get the outcome Eliezer once advocated: every one of us “picking one charity and donating as much to it as he can”.
        
        That seems an obviously fallacious argument to me. Many posts on OB have talked about other motivations behind charitable giving—whether it’s ‘buying fuzzies’ or signalling. You seem to be arguing that because one possible (but naive and inaccurate) model of a person’s utility function would predict different behaviour than what we actually observere, that the observed behaviour is evidence against any utility function being maximized. There are pretty clearly at least two possibilities here: either humans don’t maximize a utility function or they maximize a different utility function from the one you have in mind.
        
        Personally I think humans are imperfect maximizers of utility functions that are sufficiently complex that the ‘function’ terminology is as misleading as it is enlightening but your argument really doesn’t support your conclusion.
        cousin_it 30 Apr 2009 19:47 UTC
        3 points
        Parent
        Consider a simple human behavior: notice the smell of yummy food from the kitchen where Mom’s cooking, head there to check and grab a bite. Which of the following sounds like a more fitting model:
        
        1) We have a circuit hardwired to react to yummy smells when we’re hungry.
        
        2) We subconsciously sort different world-states according to a utility function that, among numerous other terms, assigns high weight to finding food when we’re hungry. (What?)
        
        If most of our behavior is better explained by arguments of type 1, why shoehorn it into a utility function and what guarantee do you have that a suitable function exists? (Sorry, “shoehorning” is really the best term for e.g. Eliezer’s arguments in favor of SPECKS or against certain kinds of circular preferences. Silly humans, my theory says you must have a coherent utility function on all imaginable worlds—or else you’re defective.) The potential harm from enforcing a total ordering on world states has, I believe, already been convincingly demonstrated; your turn.
        mattnewport 30 Apr 2009 20:54 UTC
        1 point
        Parent
        I think a few different issues are getting entangled here. I’m going to try and disentangle them a little.
        
        First, my post was primarily addressing the flawed argument that the fact we don’t all ‘pick one charity and donate as much to it as we can’ is evidence against us being utility maximizers for some incompletely known utility function. Any argument that postulates a utility function and then demonstrates how observed human behaviour does not maximize that function and presents this as evidence that we are not utility maximizers is flawed since the observed behaviour could also be explained by maximizing a different utility function. Now you could argue that this makes the theory that we are utility maximizers unfalsifiable, and I think that complaint has some merit, but the original argument is still unsound.
        
        Another issue is what exactly we mean by a utility function. If we’re talking about a function that takes world states as inputs and returns a real number representing utility as an output then it’s pretty clear that our brains do not encode such a function. I think it is potentially useful however to model our decision making process as a process by which our brains evaluate possible future states of the world and prefer some states to others (a ‘utility function’ in a looser sense) and favour actions which are expected to lead to preferred outcomes. If you’d prefer not to call this a utility function then perhaps you can suggest alternative terminology? If you dispute the value of this as a model for human decision making then that’s also a valid position but let’s focus on that discussion.
        
        Despite the flaws of the ‘utility maximizing’ model I think it has a lot of explanatory and predictive power. I would argue that it does a better job of explaining actual human behaviour than your type 1 theory which as stated would seem to have trouble accounting for me deciding to shut the door or go for a walk to get away from the tempting smell of food because I have a preference for future world states where I am not fat.
        
        My biggest problem with more extreme forms of ‘utility maximizing’ arguments is that I think they do not pay enough attention to computation limits that prevent a perfect utility maximizer from being realizable. This doesn’t mean the models aren’t useful—a model of a chess playing computer that attempts to explain/predict its behaviour by postulating that it is trying to optimize for optimal chess outcomes is still useful even if the computer is low powered or poorly coded and so plays sub-optimally.
        Expand this thread
        cousin_it 30 Apr 2009 21:30 UTC
        1 point
        Parent
        “Picking one charity and sticking to it” would follow from most “functions defined on worlds” that I’m able to imagine, while the firework of meaningless actions that we repeat every day seems to defy any explanation by utility… unless you’re willing to imagine “utility” as a kind of carrot that points in totally different directions depending on the time of day and what you ate for lunch. But in general, yes, I concede I didn’t prove logically that we have no utility function.
        
        Of course, if you’re serious about falsifying the utility theory, just work from any published example of preference reversal in real humans. There’s many to choose from.
        
        I would argue that it does a better job of explaining actual human behaviour than your type 1 theory which as stated would seem to have trouble accounting for me deciding to shut the door or go for a walk to get away from the tempting smell of food because I have a preference for future world states where I am not fat.
        
        Going by the relative frequency of your scenario vs mine, I’d say my theory wins this example hands down. :-) Even if we consider only people with a consciously stated goal of being non-fat.
        
        I think it is potentially useful however to model our decision making process as a process by which our brains evaluate possible future states of the world and prefer some states to others (a ‘utility function’ in a looser sense) and favour actions which are expected to lead to preferred outcomes.
        
        At most you can say that our brains evaluate descriptions of future states and weigh their emotional impact. Eliezer wrote eloquently about one particularly obvious preference reversal of this sort, and of course immediately launched into defending expected utility as a prescriptive rather than descriptive theory. Shut up and multiply, silly humans.
        mattnewport 30 Apr 2009 21:58 UTC
        0 points
        Parent
        
        “Picking one charity and sticking to it” would follow from most utility functions I’m able to imagine
        
        I think your imagination is rather limited then. Charitable donations as a signaling activity are one example. If you donate to charity partly to signal to others that you are an altruistic person and use your choice of charity to signal the kinds of things that you care about then donating to multiple charities can make perfect sense. Donating $500 to Oxfam and $500 to the WWF may deliver greater signaling benefits than donating $1000 to one of the two as it will be an effective signal both for third parties who prioritize famine and for third parties who prioritize animal welfare. If you are partly buying ‘fuzzies’ by donating to charity then donating to the two charities may allow you to feel good whenever you encounter news stories about either famine or endangered pandas, for a net benefit greater than feeling slightly more virtuous on encountering a subset of stories.
        
        Between evolutionary psychology, game theory, micro and behavioural economics and public choice theory to name a few research areas I have found a lot of insightful explanations of human behaviour that demonstrate people rationally responding to incentives. The explanations often reveal behaviour that appears irrational according to one version of utility makes perfect sense when you realize what people’s actual goals and preferences are. That’s not to say there aren’t examples of biases and flaws in reasoning but I’ve found considerable practical value in explaining human action through models that assume rational utility maximization.
        
        Incidentally, I don’t believe that demonstrations of preference reversal falsify the kind of model I’m talking about. They only falsify the naive ‘fully conscious rational agent with a static utility function’ model which is not much worth defending anyway.
        
        From the relative frequency of your scenario vs mine, I’d say my theory wins this example hands down. :-)
        
        Your theory fails to account for the exceptions at all though. And I have had great success losing weight by consciously arranging my environment to reduce exposure to temptation. How does your theory account for that kind of behaviour?
        cousin_it 30 Apr 2009 22:28 UTC
        0 points
        Parent
        Aaah! No, no. I originally used “picking one charity” as a metaphor for following any real-world goal concertedly and monomanically. Foolishly thought it would be transparent to everyone. Sorry.
        
        Yes, incentives do work, and utility-based models do have predictive and explanatory power. Many local areas of human activity are well modeled by utility, but it’s different utilities in different situations, not a One True Utility innate to the person. And I’m very wary of shoehorning stuff into utility theory when it’s an obviously poor fit, like moral judgements or instinctive actions.
        
        My theory doesn’t consider rational behavior impossible—it’s just exceptional. A typical day will contain one rationally optimized decision (if you’re really good; otherwise zero) and thousands of decisions made for you by your tendencies.
        
        At least that’s been my experience; maybe there are super-people who can do better. People who really do shut up and multiply with world-states. I’d be really scared of such people because (warning, Mind-Killer ahead) my country was once drowned in blood by revolutionaries wishing to build a rational, atheistic, goal-directed society. Precisely the kind of calculating altruists who’d never play chess while there was a kid starving anywhere. Of course they ultimately failed. If they’d succeeded, you’d now be living in the happiest utopia that was imaginable in the 19th century: world communism. Let that stand as a kind of “genetic” explanation for my beliefs.
        mattnewport 30 Apr 2009 22:43 UTC
        0 points
        Parent
        
        My theory doesn’t consider rational behavior impossible—it’s just exceptional. A typical day will contain one rationally optimized decision (if you’re really good; otherwise zero) and thousands of decisions made for you by your tendencies.
        
        This relates to my earlier comment about ignoring the computational limits on rationality. It wouldn’t be rational to put a lot of effort into rationally optimizing every decision you make during the day. In my opinion any attempts to improve human rationality have to recognize that resource limitations and computational limits are an important constraint. Having an imperfect but reasonable heuristic for most decisions is a rational solution to the problem of making decisions given limited resources. It would be great to figure out how to do better given the constraints but theories that start from an assumption of unlimited resources are going to be of limited practical use.
        
        my country was once drowned in blood by revolutionaries wishing to build a rational, atheistic, goal-directed society.
        
        I can see how conflating communism with rationality would lead you to be distrustful of rationality. I personally think the greatest intellectual failure of communism was failing to recognize the importance of individual incentives and utility maximization or to acknowledge the gap between people’s stated intentions and their actual motivations which means in my view it was never rational. Hayek’s economic calculation problem criticism of socialism is an example of recognizing the importance of computational constraints when trying to improve decisions. I’d agree that there is a danger of people with a naive view of rationality and utility thinking that communism is a good idea though.
        Cyan 30 Apr 2009 2:47 UTC
        2 points
        Parent
        
        Especially seeing as machines based on e.g. control theory (RichardKennaway) behave much more sensibly—they almost never display any urge to screw up the whole world, instead being content to sit there and tweak their needle.
        
        This is a rather bad example—machines based on control theory can easily display an “urge” to screw up as much of the world as they can touch. Short version: slapping a PID controller onto a system gives it second order dynamics, and those can have a resonant frequency. If the random disturbance has power at the resonant frequency, the system goes into a positive feedback loop and blows up.
        Vladimir_Nesov 30 Apr 2009 0:38 UTC
        0 points
        Parent
        I agree. But what do you do with this situation? To give up, you have to be certain that there is no way out, and we are much too confused to say anything like that yet. Someone is bound to build a doom machine someday if you don’t do something about it.
        Nick_Tarleton 30 Apr 2009 0:32 UTC
        0 points
        Parent
        Normative decision theory – the structure our final, stable preferences if we knew more, thought faster, were more the people we wished we were, had grown up further together – needn’t be good engineering design; agreed that utility functions often aren’t the latter, but that doesn’t count against them as the former.
        
        Maybe Psy-Kosh should say “becoming” instead of “building”?
      - Vladimir_Nesov 30 Apr 2009 0:30 UTC
        1 point
        Parent
        That is a right sentiment about strength: there are no simple rules, only goals, which makes a creative mind extremely dangerous. And we shouldn’t build things like this without understanding what the outcome will be. This is one of the reasons it’s important to understand human values in this light, to guard them from this destructive potential.
        
        Whatever you want accomplished, whatever you want averted, instrumental rationality defines an optimal way of doing that (without necessarily giving the real-world means, that’s a next step). If you really want life to continue as before, the correctly implemented explicit utility function for doing that won’t lead a Bayesian optimizer to do something horrible. (Although inaction may be considered horrible in itself, where so much more could’ve been done.)
    - thomblake 30 Apr 2009 13:12 UTC
      −1 points
      Parent
      
      given unbounded computational power
      
      You don’t get to assume that till tomorrow.
  - Vladimir_Nesov 29 Apr 2009 23:14 UTC
    0 points
    Parent
    Your statements about application of decision-making to humans still fail to make any sense to me. I fail to form a coherent model of how you understand this issue. Could you try to write up a short step-by-step introduction to your position, maybe basic terms only, just to establish a better vocabulary to build on? Open thread seems like a right place for such post.
    - cousin_it 30 Apr 2009 20:40 UTC
      8 points
      Parent
      Short version: beyond a certain (very coarse) precision you can’t usefully model humans as logical, goal-directed, decision-making agents contaminated by pesky “biases”. Goals, decisions and agency are very leaky abstractions, illusions that arise from the mechanical interplay of our many ad-hoc features. Rather than heading off for the sunset, the 99% typical behavior of humans is going around in circles day after day; if this is goal-directed, the goal must be weird indeed. If you want to make predictions about actual human beings, don’t talk about their goals, talk about their tendencies.
      
      Far from distressing me, this situation makes me happy. It’s great we have so few optimizers around. Real-world strong optimizers, from natural selection to public corporations to paperclippers, look psychopathic and monstrous when viewed through the lens of our tendency-based morality.
      
      For more details see thread above. Or should I compile this stuff into a toplevel post?
      - Vladimir_Nesov 1 May 2009 9:43 UTC
        1 point
        Parent
        Okay, I’ve probably captured the gist of your position now. Correct me if I’m speaking something out of its character below.
        
        Humans are descriptively not utility maximizers, they can only be modeled this way under coarse approximation and with a fair number of exceptions. There seems to be no reason to normatively model them with some ideal utility maximizer, to apply the concepts like should in more rigorous sense of decision theory.
        
        Humans do what they do, not what they “should” according to some rigorous external model. This is an argument and intuition similar to not listening to philosopher-constructed rules of morality, non-intuitive conclusions reached from considering a thought experiment, or God-declared moral rules, since you first have to accept each moral rule yourself, according to your own criteria, which might even be circular.
      - Z_M_Davis 1 May 2009 2:41 UTC
        1 point
        Parent
        
        It’s great we have so few optimizers around. Real-world strong optimizers, from natural selection to public corporations to paperclippers, look psychopathic and monstrous when viewed through the lens of our tendency-based morality.
        
        I thought this was the point of the Overcoming Bias project and the endeavor not to be named until tomorrow (cf. “Thou Art Godshatter” and “Value is Fragile”): that we want to put the fearsome power of optimization in the service of humane values, instead just of leaving things to nature, which is monstrous.
        
        Or should I compile this stuff into a toplevel post?
        
        I would love to see a top-level post on this issue.
    - gjm 29 Apr 2009 23:40 UTC
      0 points
      Parent
      Is that addressed to cousin_it or Psy-Kosh?
      - Vladimir_Nesov 29 Apr 2009 23:42 UTC
        0 points
        Parent
        To cousin_it, obviously...
        gjm 30 Apr 2009 0:14 UTC
        0 points
        Parent
        Thanks. (It wasn’t obvious to me, because I’ve seen similar comments from you to Psy-Kosh recently, and don’t remember seeing any such to cousin_it. And it’s not entirely outside the bounds of possibility for someone to make a comment a sibling rather than a child of what it’s responding to.)