On Skype with Eliezer, I said: “Eliezer, you’ve been unusually pleasant these past three weeks. I’m really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?”
Eliezer replied: “Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M.”
If I recall my high school psychology class correctly, you can get a stronger and more persistent effect by secretly rolling a dice and note the number, and when Eliezer says that many nice things, give him an M&M, roll the dice again for a new target number of nice things.
That’s true and false. Intermittent reinforcement gets a more robust effect than continual reinforcement, yes, but randomly intermittent reinforcement isn’t as effective as setting the reward threshold higher as the behavior becomes more common… e.g., rewarding only the 10% nicest things.
I want to design a reinforcement schedule in one of our apps. Can anyone link me to some specific guidelines on how to optimise this?
(Reinforce exactly what % of successes (30%? 26%? 8%?)? Reinforce performances in the top 10% of past performances (or the top 12%, or the top 8%?)? How does time factor (if the user hasn’t used the app for a week, should I push a reinforcer forward?)?)
I can’t, but if you find anything concise and useful, I’d love to hear about it myself.
My rule of thumb is to set the threshold so as to reinforce the top 20% or so of performances, and arrange performance frequencies so I’m reinforcing 2-3 times/minute during active training periods. But that’s not based on anything.
I’ll also note that reinforcing higher-tier performances more strongly works really well (though is hard to do consistently by hand), as do very intermittent “jackpots” (disproportional and unpredictable mega-rewards).
If I recall my high school psychology class correctly, you can get a stronger and more persistent effect by secretly rolling a dice and note the number, and when Eliezer says that many nice things, give him an M&M, roll the dice again for a new target number of nice things.
That’s true and false. Intermittent reinforcement gets a more robust effect than continual reinforcement, yes, but randomly intermittent reinforcement isn’t as effective as setting the reward threshold higher as the behavior becomes more common… e.g., rewarding only the 10% nicest things.
I want to design a reinforcement schedule in one of our apps. Can anyone link me to some specific guidelines on how to optimise this?
(Reinforce exactly what % of successes (30%? 26%? 8%?)? Reinforce performances in the top 10% of past performances (or the top 12%, or the top 8%?)? How does time factor (if the user hasn’t used the app for a week, should I push a reinforcer forward?)?)
I can’t, but if you find anything concise and useful, I’d love to hear about it myself.
My rule of thumb is to set the threshold so as to reinforce the top 20% or so of performances, and arrange performance frequencies so I’m reinforcing 2-3 times/minute during active training periods. But that’s not based on anything.
I’ll also note that reinforcing higher-tier performances more strongly works really well (though is hard to do consistently by hand), as do very intermittent “jackpots” (disproportional and unpredictable mega-rewards).
Some previous discussion about this form of conditioning.
When the threshold is “something nice”, there’s going to be randomness in the reinforcement anyway.