We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
You seem to have interpreted my comments as saying that we’re trying to reward some particular behavior, but we should do this by rewarding the results of that behavior. As you point out, this is not a wise plan.
But it’s also not what I am saying, at all. I am saying that we are (or, again, should be) trying to reward the results. Not the behavior that led to those results, but the results themselves.
I don’t know why you’re assuming that we’re actually trying to encourage some specific behavior. It’s certainly not what I am assuming. Doing so would not be a very good idea at all.
I think with that approach there are a great many results you’d fail to achieve. People can get animals to do remarkable things with shaping and I would wager that you can’t do them at all otherwise.
We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot. … The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. … The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay.
Humans are more sophisticated than birds, but producing highly complex and abstruse truths in a format understandable to others is also a lot more complicated than getting a bird to put its beak in a particular spot. I think all the same mechanics are at work. If you want to get someone (including yourself) to do something as complex and difficult as producing valuable, novel, correct, expositions of true things on LessWrong—you’re going to have to reward the predictable intermediary steps.
We don’t go to five year olds and say “the desired result is that you can write fluently, therefore no positive feedback on your marginal efforts until you can do so, in fact, I’m going to strike your knuckles every time you make a spelling error or anything which isn’t what we hope to see from you when you’re 12, we will only reward the final desired result and you can back propagate from that to get figure out what’s good.” That’s really only a recipe for children who are unwilling to put any effort in learning to write, not those who progressively put in effort over years to learn what it even looks like to a be a competent writer.
This is beyond my earlier point that verifying results in our cases is often much harder than verifying that good steps were being taken.
We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
I’m afraid this sentence doesn’t parse for me. You seem to be speaking of “results” as something which to which the concept of rewards and punishments are applicable. However, I’m not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I’ve encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there’s something else you’re referring to when you say “reward or punish the results”, I would appreciate it if you clarified what exactly that thing is.
I don’t see what could be simpler. Alice does something. That action has some result. We reward Alice, or punish her, based on the results of her action. There is nothing unusual or obscure here; I mean just what I say.
(There are cases where we do not want to take this approach, but they tend to both be controversial and to be unusual in certain important respects.)
Edit: And if you’re trying to use operant conditioning, of all things, to decide what social norms to have on a forum devoted to the art of rationality, then you’ve already admitted defeat, and this entire project is pointless.
You’re still missing my point.
We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.
You seem to have interpreted my comments as saying that we’re trying to reward some particular behavior, but we should do this by rewarding the results of that behavior. As you point out, this is not a wise plan.
But it’s also not what I am saying, at all. I am saying that we are (or, again, should be) trying to reward the results. Not the behavior that led to those results, but the results themselves.
I don’t know why you’re assuming that we’re actually trying to encourage some specific behavior. It’s certainly not what I am assuming. Doing so would not be a very good idea at all.
I think with that approach there are a great many results you’d fail to achieve. People can get animals to do remarkable things with shaping and I would wager that you can’t do them at all otherwise.
From the Wikipedia article on Shaping (psychology):
Humans are more sophisticated than birds, but producing highly complex and abstruse truths in a format understandable to others is also a lot more complicated than getting a bird to put its beak in a particular spot. I think all the same mechanics are at work. If you want to get someone (including yourself) to do something as complex and difficult as producing valuable, novel, correct, expositions of true things on LessWrong—you’re going to have to reward the predictable intermediary steps.
We don’t go to five year olds and say “the desired result is that you can write fluently, therefore no positive feedback on your marginal efforts until you can do so, in fact, I’m going to strike your knuckles every time you make a spelling error or anything which isn’t what we hope to see from you when you’re 12, we will only reward the final desired result and you can back propagate from that to get figure out what’s good.” That’s really only a recipe for children who are unwilling to put any effort in learning to write, not those who progressively put in effort over years to learn what it even looks like to a be a competent writer.
This is beyond my earlier point that verifying results in our cases is often much harder than verifying that good steps were being taken.
See the “Edit:” part of this comment, which is my response to your comment also.
I’m afraid this sentence doesn’t parse for me. You seem to be speaking of “results” as something which to which the concept of rewards and punishments are applicable. However, I’m not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I’ve encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there’s something else you’re referring to when you say “reward or punish the results”, I would appreciate it if you clarified what exactly that thing is.
I don’t see what could be simpler. Alice does something. That action has some result. We reward Alice, or punish her, based on the results of her action. There is nothing unusual or obscure here; I mean just what I say.
(There are cases where we do not want to take this approach, but they tend to both be controversial and to be unusual in certain important respects.)
Edit: And if you’re trying to use operant conditioning, of all things, to decide what social norms to have on a forum devoted to the art of rationality, then you’ve already admitted defeat, and this entire project is pointless.