That is a question of low philosophical value, but of the highest practical importance.
At line 3,000,000 with the 98.9% setup, in the full log, there is these two informations:
‘sp: 1640 734 548’, and ‘nbupd: 7940.54 6894.20’ (and the test accuracy is exactly 98.9%)
It means that the average spike count for the IS-group is 1640 per sample, 734 for the highest ISNOT-group and 548 for the other ones. The number of weight updates per IS-learn is 7940.54 and 6894.20 per ISNOT-learn. With the coding scheme used, the average number of inputs per sample over the four cycles is 2162 (total number of activated pixels in the input matrix) for 784 pixels. There is 7920 neurons per group with 10 connections each (so 10⁄784 th of the pixel matrix), for a total of 79200 neurons.
From those numbers:
The average number of integer additions done for all neurons in a group when a sample is presented is: 79200 * 2162 * 10⁄784 = 2,184,061 integer additions in total.
And for the spike counts: 1640 + 734 + 8*548 = 6758 INCrements (counting the spikes).
When learning, for each IS-learn, there is 7940, and for each ISNOT-learn, 6894, weight updates . Those are additions of integers. So an average of (7940 + 9*6894) / 10 = 7000 additions of integer.
That is to be compared with a 3 fully connected, say 800 units (to make up for the difference between 98.90% and 98.94%) layers.
That would be at least 800*800*2 + 800*784 = 1,907,200 floating point multiplications, plus what would be used for Max-norm, ReLu,… that I am not qualified to evaluate, but might roughly double that ?
And the same for each update (low estimate).
Even with recent works on sparse updates that do reduce that by 93%, it is still more than 133,000 floating-point multiplications (against 7000 additions of integers).
I have managed to get over 98.5% with 2000 neurons (20,000 connections). I would like to know if BP/SGD can perform to that level with such a small number of parameters (that would be one fully connected layer of 25 units) ? And, as I said in the roadmap, that is what will matter for full real systems.
That is the basic building bloc. the 1x1x1 Lego brick. 1.5/1.1 = 36% improvement with 40 times the ressources is useless in practice.
And that is missing the real point laid out in the Roadmap: this system CAN and MUST be implemented in analog (hybrid until we get practical memristors), whereas BP/SGD CAN NOT.
There is, at least, another order of magnitude in efficiency to be gained there.
There is a lot of effort invested, at this time, in the industry to implement AI at IC-level. Now is the time.
That is a question of low philosophical value, but of the highest practical importance.
At line 3,000,000 with the 98.9% setup, in the full log, there is these two informations:
‘sp: 1640 734 548’, and ‘nbupd: 7940.54 6894.20’ (and the test accuracy is exactly 98.9%)
It means that the average spike count for the IS-group is 1640 per sample, 734 for the highest ISNOT-group and 548 for the other ones. The number of weight updates per IS-learn is 7940.54 and 6894.20 per ISNOT-learn. With the coding scheme used, the average number of inputs per sample over the four cycles is 2162 (total number of activated pixels in the input matrix) for 784 pixels. There is 7920 neurons per group with 10 connections each (so 10⁄784 th of the pixel matrix), for a total of 79200 neurons.
From those numbers:
The average number of integer additions done for all neurons in a group when a sample is presented is: 79200 * 2162 * 10⁄784 = 2,184,061 integer additions in total.
And for the spike counts: 1640 + 734 + 8*548 = 6758 INCrements (counting the spikes).
When learning, for each IS-learn, there is 7940, and for each ISNOT-learn, 6894, weight updates . Those are additions of integers. So an average of (7940 + 9*6894) / 10 = 7000 additions of integer.
That is to be compared with a 3 fully connected, say 800 units (to make up for the difference between 98.90% and 98.94%) layers.
That would be at least 800*800*2 + 800*784 = 1,907,200 floating point multiplications, plus what would be used for Max-norm, ReLu,… that I am not qualified to evaluate, but might roughly double that ?
And the same for each update (low estimate).
Even with recent works on sparse updates that do reduce that by 93%, it is still more than 133,000 floating-point multiplications (against 7000 additions of integers).
I have managed to get over 98.5% with 2000 neurons (20,000 connections). I would like to know if BP/SGD can perform to that level with such a small number of parameters (that would be one fully connected layer of 25 units) ? And, as I said in the roadmap, that is what will matter for full real systems.
That is the basic building bloc. the 1x1x1 Lego brick. 1.5/1.1 = 36% improvement with 40 times the ressources is useless in practice.
And that is missing the real point laid out in the Roadmap: this system CAN and MUST be implemented in analog (hybrid until we get practical memristors), whereas BP/SGD CAN NOT.
There is, at least, another order of magnitude in efficiency to be gained there.
There is a lot of effort invested, at this time, in the industry to implement AI at IC-level. Now is the time.