GraceFu comments on Proper value learning through indifference

GraceFu 21 Jun 2014 17:14 UTC
1 point

“Well,” you say, “if you take over and donate £10 to AMF in my place, I’d be perfectly willing to send my donation to Oxfam instead.”

“Hum,” I say, because I’m a hummer. “A donation to Oxfam isn’t completely worthless to you, is it? How would you value it, compared with AMF?”

“At about a tenth.”

“So, if I instead donated £9 to AMF, you should be willing to switch your £10 donations to Oxfam (giving you the equivalent value of £1 to AMF), and that would be equally good as the status quo?”

Question: I don’t understand your Oxfam/AMF example. According to me, if you decided to donate £10 to AMF, I see a that Oxfam, which I care about 0.1 times as much as AMF, has lost £1 worth of AMF donation, while AMF has gained £10. If I then decide to follow through with my perfect willingness, and I donate £10 to Oxfam, only then do I have equilibrium, because

Before: £10 0.1 utiliton + £10 1 utiliton = 11 utilitons.

After: £10 0.1 utiliton + £10 1 utiliton = 11 utilitons.

But in the second hypothetical,

After: £11 0.1 utiliton + £9 1 utiliton = 10.1 utilitons.

Which seems clearly inferior. In fact, even if you offered to switch donations with me, I wouldn’t accept, because I may not trust you to fulfil your end of the deal, resulting in a lower expected utility.

I’m clearly missing some really important point here, but I fail to see how the example is related to utility function updating...
- Stuart_Armstrong 23 Jun 2014 13:50 UTC
  1 point
  Parent
  In the first situation, you were donating £10 to AMF (10 utilons).
  
  Then I ask you to which to Oxfam. You said yes, if I covered your donation to AMF. This would indeed give you £10+0.1*£10=£11, as you said.
  
  I said “hang on.” I pointed out that this was pure profit for you, and that if in instead I gave £9 to AMF, then this would be equivalent to your first situations (£9 (from me to AMF) + 0.1*£10 (from you to Oxfam) = £10). This is the point at which you are indifferent to changing.
  
  because I may not trust you to fulfil your end of the deal
  
  We removed those potential issues to get a clearer example.
  - GraceFu 29 Jun 2014 19:46 UTC
    3 points
    Parent
    Ah! I finally get it! Unfortunately I haven’t gotten the math. Let me try to apply it, and you can tell me where (if?) I went wrong.
    
    U = v + (Past Constants) →
    
    U = w + E(v|v→v) - E(w|v→w) + (Past Constants).
    
    Before, U = v + 0, setting (Past Constants) to 0 because we’re in the initial state. v = 0.1*Oxfam + 1*AMF.
    
    Therefore, U = 10 utilitons.
    
    After I met you, you want me to change my w to weight Oxfam higher, but only if a constant was given (the E terms) U’ = w + E(v|v->v) - E(w|v->w). w = 1*Oxfam + 0.1*AMF.
    
    What we want is for U = U’.
    
    E(v|v->v) = ? I’m guessing this term means, “Let’s say I’m a v maximiser. How much is v?” In that case, E(v|v->v) = 10 utilitons.
    
    E(w|v->w) = ? I’m guessing this term means, “Let’s say I become a w maximiser. How much is w?” In that case, E(w|v->w) = 10 utilitons.
    
    U’ = w + 10 − 10 = w.
    
    Let’s try a different U*, with utility function w* = 1*Oxfam + 10*AMF (It acts the same as a v-maximiser) E(v|v->v) = 10 utilitons. E(w*|v->w*) = 100 utilitons. U* = w* + 10 − 100 = w* − 90.
    
    Trying this out, we obviously will be donating 10 to AMF in both utility functions. U = v = 0.1*Oxfam + 1*AMF = 0.1*0 + 1*10 = 10 utilitons. U* = w* − 90 = 1*Oxfam + 10*AMF − 90 = 0 + 100 − 90 = 10 utilitons.
    
    Obviously all these experiments are useless. v = 0.1*Oxfam + 1*AMF is a completely useless utility function. It may as well be 0.314159265*Oxfam + 1*AMF. Let’s try something that actually makes some sense, (economically.)
    
    Let’s have a simple marginal utility curve, (note partial derivatives) dv/dOxfam = 1-0.1*Oxfam, dv/dAMF = 10-AMF. In both cases, donating more than 10 to either charity is plain stupid.
    
    U = v v = (Oxfam-0.05*Oxfam^2) + (10*AMF-0.5*AMF^2) Maximising U leads to AMF = ¹⁰⁰⁄₁₁ ≈ 9.09, Oxfam ≈ 0.91 v happens to be: v = ⁵⁵⁵⁄₁₁ ≈ 50.45
    
    (Note: Math is mostly intuitive to me, but when it comes to grokking quadratic curves by applying them to utility curves which I’ve never dabbled with before, let’s just say I have a sizeable headache about now.)
    
    Now you, because you’re so human and you think we simulated AI can so easily change our utility functions, come over to me and tell me to change v to w = (100*Oxfam-5*Oxfam^2) + (10*AMF-0.5*AMF^2). What you’re saying is to increase dw/dOxfam = 100 * dv/dOxfam, while leaving dw/dAMF = dv/dAMF. Again, partial derivatives.
    
    U’ = w + E(v|v->v) - E(w|v->w). Maximising w leads to Oxfam = ¹⁰⁰⁄₁₁ ≈ 9.09, AMF = 0.91, the opposite of before. w = ⁵⁵⁵⁰⁄₁₁ ≈ 504.5 U’ = w + ⁵⁵⁵⁄₁₁ − ⁵⁵⁵⁰⁄₁₁ = w − ⁴⁹⁹⁵⁄₁₁ Which still checks out.
    
    Also, I think I finally get the math too, after working this out numerically. It’s basically U = (Something), and trying to make the utility function change must preserve that (Something). U’ = (Something) is a requirement. so you have your U = v + (Constants), and you set U’ = U, just that you have to maximise v or w before determining your new set of (Constants) max(v) + (Constants) = max(w) + (New Constants)
    
    (New Constants) = max(v) - max(w) + (Constants), which are your E(v|v->v) - E(w|v->w) + (Constants) terms, except under different names.
    
    Huh. If only I had thought max(v) and max(w) from the start… but instead I got confused with the notation.
    - Stuart_Armstrong 30 Jun 2014 10:03 UTC
      3 points
      Parent
      Thanks for sticking it out to the end :-)