JGWeissman comments on The Power of Reinforcement

JGWeissman 21 Jun 2012 1:58 UTC
15 points

On Skype with Eliezer, I said: “Eliezer, you’ve been unusually pleasant these past three weeks. I’m really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?”

Eliezer replied: “Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M.”

If I recall my high school psychology class correctly, you can get a stronger and more persistent effect by secretly rolling a dice and note the number, and when Eliezer says that many nice things, give him an M&M, roll the dice again for a new target number of nice things.
- TheOtherDave 21 Jun 2012 2:12 UTC
  29 points
  Parent
  That’s true and false. Intermittent reinforcement gets a more robust effect than continual reinforcement, yes, but randomly intermittent reinforcement isn’t as effective as setting the reward threshold higher as the behavior becomes more common… e.g., rewarding only the 10% nicest things.
  - matt 21 Jun 2012 19:09 UTC
    10 points
    Parent
    I want to design a reinforcement schedule in one of our apps. Can anyone link me to some specific guidelines on how to optimise this?
    
    (Reinforce exactly what % of successes (30%? 26%? 8%?)? Reinforce performances in the top 10% of past performances (or the top 12%, or the top 8%?)? How does time factor (if the user hasn’t used the app for a week, should I push a reinforcer forward?)?)
    - TheOtherDave 21 Jun 2012 19:17 UTC
      0 points
      Parent
      I can’t, but if you find anything concise and useful, I’d love to hear about it myself.
      
      My rule of thumb is to set the threshold so as to reinforce the top 20% or so of performances, and arrange performance frequencies so I’m reinforcing 2-3 times/minute during active training periods. But that’s not based on anything.
      
      I’ll also note that reinforcing higher-tier performances more strongly works really well (though is hard to do consistently by hand), as do very intermittent “jackpots” (disproportional and unpredictable mega-rewards).
- dbaupp 21 Jun 2012 5:41 UTC
  7 points
  Parent
  Some previous discussion about this form of conditioning.
- Paul Crowley 21 Jun 2012 6:18 UTC
  5 points
  Parent
  When the threshold is “something nice”, there’s going to be randomness in the reinforcement anyway.