You didn’t miss it. Your quote and the later bit about “The question to ask is not ‘how’ to learn, but ‘when’.” seem to contradict each other. I think your quote is just a general allegory to Hebb’s rule and that it’s not meant to be taken as a literal system spec, but I could be wrong. I am a confused by the original description.
The way I phrased that was, deliberately, ambiguous.
Since 1958, the question the field has been trying to answer is how to transfer the information we get when a sample is presented, to the weights, so next time it will perform better.
BP computes the difference between what would be expected and what is measured, and the propagates it to all intermediary weights according to a set of mathematically derived rules (the generalised delta rule). A lot of work as gone into figuring out the best way to do that. This is what I called ‘how’ to learn.
In this system, the method used is just the simplest possible, and the most intuitive, one: INC and DEC of the weights depending on wether it is or not the correct answer.
The quantiliser, then, tell the system to only apply that simple rule under certain conditions (the Δ⊤and Δ⊥i limits). That is ‘when’.
You can use the delta rule instead of our basic update rule if you want (I tried). The result is not better and it is less stable, so you have to use small gradients. The problem, as I see it, is that the conditions under witch Jessica Taylor’s theorem is valid are not met any more and you have to ‘fix’ that. I did not investigate that extensively.
You didn’t miss it. Your quote and the later bit about “The question to ask is not ‘how’ to learn, but ‘when’.” seem to contradict each other. I think your quote is just a general allegory to Hebb’s rule and that it’s not meant to be taken as a literal system spec, but I could be wrong. I am a confused by the original description.
That actually brings us to the core of it.
The way I phrased that was, deliberately, ambiguous.
Since 1958, the question the field has been trying to answer is how to transfer the information we get when a sample is presented, to the weights, so next time it will perform better.
BP computes the difference between what would be expected and what is measured, and the propagates it to all intermediary weights according to a set of mathematically derived rules (the generalised delta rule). A lot of work as gone into figuring out the best way to do that. This is what I called ‘how’ to learn.
In this system, the method used is just the simplest possible, and the most intuitive, one: INC and DEC of the weights depending on wether it is or not the correct answer.
The quantiliser, then, tell the system to only apply that simple rule under certain conditions (the Δ⊤and Δ⊥i limits). That is ‘when’.
You can use the delta rule instead of our basic update rule if you want (I tried). The result is not better and it is less stable, so you have to use small gradients. The problem, as I see it, is that the conditions under witch Jessica Taylor’s theorem is valid are not met any more and you have to ‘fix’ that. I did not investigate that extensively.