It seems to me that if humans were emotionless utility maximizers, we would prefer hearing criticism over praise, the same way programmers purchase more utility by fixing bugs in their programs than polishing features that already work. I suspect criticism is generally more valuable from a pure decision theoretic perspective.
I wonder if there is an effective way to buy encouragement and criticism separately. Also, it’s hard to know exactly how best to encourage folks. In theory it’s possible that making a new website is not the best use of SI’s resources, which suggests reinforcement would not be optimal. But we still may want to reinforce you towards the more general behavior of taking steps to achieve your organizational goals. So what’s the best response?
Maybe someone can develop some general guidelines for reinforcing/criticizing people, similar to what the nonviolent communication people came up with. (When {observable event} happened, I felt {feeling} because I need/value {underlying need that felt unmet or value that felt jeopardized}. Would you be willing to {specific request that person could do} in the future?) E.g. check to see if the person was acting with good intentions and reinforce them for those if they existed, check for super goals you endorse and reinforce them for working to accomplish those, check to see if the person could just have easily have sat around doing nothing and reinforce them for expending effort if this was the case, etc.
I think optimally criticism would have lots more reinforcers associated with it: people should be reinforced for requesting, giving, and receiving criticism because these are all activities that are naturally aversive but actually have high expected value.
So, I wholeheartedly endorse the following actions of yours: attempting to maximize humanity’s collective utility function, working on the super goal of AGI safety, actually doing stuff, and deliberately gathering critical feedback. Go Luke!
Criticism > Praise > Nothing. The problem is, people default to “Criticize or stay quiet”, and so I tend to value praise highly, as it’s culturally much rarer.
Also, if it’s a matter of opinion (rather than an actual code bug), praise can actively offset criticism (1 person dislikes it, but the other 99 users all love the new UI… probably not wise to revert!)
Edit: This statement is basically wrong. I was confusing negative instruction with punishment. Comment preserved for continuity.
Interestingly, adding a stimuli that decreases the frequency of a behavior (aka positive punishment) is less effective at changing behavior frequency than positive reinforcement.
That is, reinforcing an alternate behavior is more effective at decreasing a problem behavior than simply punishing the problem behavior. (I think this is even true when there are only two possible behaviors).
Really? I’d love a reference for this. My understanding was always that positive punishment has a stronger effect on behavior frequency than (for example) training an incompatible behavior, but also has lots of other effects that I don’t want to instill, which are often more important than maximizing effect on behavior frequency (e.g., reducing the rate at which novel behaviors are offered).
Let me clarify slightly, because I wasn’t trying to say something earth-shaking. If I did say something earth-shaking, I’m probably wrong.
My statement was made assuming that Bob already has a problem behavior that we would like to decrease the frequency of, and eventually extinguish. To be more concrete, let’s say Bob has bathroom accidents (he voids away from the toilet). All I meant to say was the statement “Good job going pee-pee on the potty” is more effective at reducing the frequency of accidents than “Bob, you shouldn’t go pee-pee in your underwear.”
Yes, I’m toilet training my son—why do you ask? :)
Well… hrm. That might very well be true about toilet-training, as increased anxiety is one of the side-effects of positive punishment, and anxiety interacts exceptionally poorly with bladder control. So, I dunno.
But in general, I’m pretty sure what you’re saying isn’t quite right. If I want to extinguish, say, jumping on the couch, consistently punishing incidents of jumping on the couch will extinguish the behavior much faster than pretty much anything else I can do.
Please don’t misunderstand me; I absolutely don’t endorse this as a training technique. But the reason I reject it isn’t because it doesn’t extinguish the behavior quickly… it does. The reason I reject it is because it creates a host of related side-effects that make subsequent training much more difficult, not to mention make the subsequent relationship with the trainer (and often with everyone else) much more unpleasant for the trainee.
Punishment is a blunt axe, but it’s a powerful blunt axe.
I talked with my wife, the future BCBA, and it appears that my intellectual reach has exceeded my grasp. First, I seem to have confused positive reinforcement v. punishment and positive and negative instruction. It is the case that negative instruction (“Don’t throw your toy car”) is less effective than positive instruction (“We only throw balls”).
Second, there are some interventions, reinforcing and punishing, that could teach in one trial (consider heroin injections as reinforcement and flamethrowers as punishment). Edit: my wife says this point is about salience.
Third, best practices among behavior analysts are to use reinforcement prior to using punishment. My wife says that this is for ethical reasons—her reference book didn’t talk about the relative effectiveness of reinforcement and punishment.
Sorry about that.
It seems to me that if humans were emotionless utility maximizers, we would prefer hearing criticism over praise, the same way programmers purchase more utility by fixing bugs in their programs than polishing features that already work. I suspect criticism is generally more valuable from a pure decision theoretic perspective.
I wonder if there is an effective way to buy encouragement and criticism separately. Also, it’s hard to know exactly how best to encourage folks. In theory it’s possible that making a new website is not the best use of SI’s resources, which suggests reinforcement would not be optimal. But we still may want to reinforce you towards the more general behavior of taking steps to achieve your organizational goals. So what’s the best response?
Maybe someone can develop some general guidelines for reinforcing/criticizing people, similar to what the nonviolent communication people came up with. (When {observable event} happened, I felt {feeling} because I need/value {underlying need that felt unmet or value that felt jeopardized}. Would you be willing to {specific request that person could do} in the future?) E.g. check to see if the person was acting with good intentions and reinforce them for those if they existed, check for super goals you endorse and reinforce them for working to accomplish those, check to see if the person could just have easily have sat around doing nothing and reinforce them for expending effort if this was the case, etc.
I think optimally criticism would have lots more reinforcers associated with it: people should be reinforced for requesting, giving, and receiving criticism because these are all activities that are naturally aversive but actually have high expected value.
So, I wholeheartedly endorse the following actions of yours: attempting to maximize humanity’s collective utility function, working on the super goal of AGI safety, actually doing stuff, and deliberately gathering critical feedback. Go Luke!
Criticism > Praise > Nothing. The problem is, people default to “Criticize or stay quiet”, and so I tend to value praise highly, as it’s culturally much rarer.
Also, if it’s a matter of opinion (rather than an actual code bug), praise can actively offset criticism (1 person dislikes it, but the other 99 users all love the new UI… probably not wise to revert!)
Edit: This statement is basically wrong. I was confusing negative instruction with punishment. Comment preserved for continuity.
Interestingly, adding a stimuli that decreases the frequency of a behavior (aka positive punishment) is less effective at changing behavior frequency than positive reinforcement.
That is, reinforcing an alternate behavior is more effective at decreasing a problem behavior than simply punishing the problem behavior. (I think this is even true when there are only two possible behaviors).
Really? I’d love a reference for this. My understanding was always that positive punishment has a stronger effect on behavior frequency than (for example) training an incompatible behavior, but also has lots of other effects that I don’t want to instill, which are often more important than maximizing effect on behavior frequency (e.g., reducing the rate at which novel behaviors are offered).
Let me clarify slightly, because I wasn’t trying to say something earth-shaking. If I did say something earth-shaking, I’m probably wrong.
My statement was made assuming that Bob already has a problem behavior that we would like to decrease the frequency of, and eventually extinguish. To be more concrete, let’s say Bob has bathroom accidents (he voids away from the toilet). All I meant to say was the statement “Good job going pee-pee on the potty” is more effective at reducing the frequency of accidents than “Bob, you shouldn’t go pee-pee in your underwear.”
Yes, I’m toilet training my son—why do you ask? :)
Well… hrm. That might very well be true about toilet-training, as increased anxiety is one of the side-effects of positive punishment, and anxiety interacts exceptionally poorly with bladder control. So, I dunno.
But in general, I’m pretty sure what you’re saying isn’t quite right. If I want to extinguish, say, jumping on the couch, consistently punishing incidents of jumping on the couch will extinguish the behavior much faster than pretty much anything else I can do.
Please don’t misunderstand me; I absolutely don’t endorse this as a training technique. But the reason I reject it isn’t because it doesn’t extinguish the behavior quickly… it does. The reason I reject it is because it creates a host of related side-effects that make subsequent training much more difficult, not to mention make the subsequent relationship with the trainer (and often with everyone else) much more unpleasant for the trainee.
Punishment is a blunt axe, but it’s a powerful blunt axe.
I talked with my wife, the future BCBA, and it appears that my intellectual reach has exceeded my grasp. First, I seem to have confused positive reinforcement v. punishment and positive and negative instruction. It is the case that negative instruction (“Don’t throw your toy car”) is less effective than positive instruction (“We only throw balls”).
Second, there are some interventions, reinforcing and punishing, that could teach in one trial (consider heroin injections as reinforcement and flamethrowers as punishment). Edit: my wife says this point is about salience.
Third, best practices among behavior analysts are to use reinforcement prior to using punishment. My wife says that this is for ethical reasons—her reference book didn’t talk about the relative effectiveness of reinforcement and punishment.
Ah! Yes, that makes sense. Negative instruction doesn’t work very well, it’s true.
Mm… yeah, that’s a good point. I was eliding the distinction between salience and reward/punishment, and ought not have.