I’m still thinking about this idea. We could try to do the same thing but on Cifar10. I do not know if it would be possible to construct by hand the layers.
On mnist, for a network (LeNet, 60k parameters)with 99 percents accuracy, the crossentropy is 0.05
If we take the formula: CE + lambda log nb non null params
A good lambda is equal to 100. (Equalizing crossentropy and regularization)
In the mnist minimal number of weights competition, we have 99 percents accuracy with 2000 weights. So lambda is equal to 80.
Maybe If we want to stress the importance of sparsity, we can choose a lambda equal to 300.
Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters.
On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist.
I think that a too large choice of lambda is better than a too small one because if lambda is too big the results will still be interesting (which model architecture is the best under extremely strong regularization?) while if it is too small you will just get a normal architecture slightly more regularized.
I’m still thinking about this idea. We could try to do the same thing but on Cifar10. I do not know if it would be possible to construct by hand the layers.
On mnist, for a network (LeNet, 60k parameters)with 99 percents accuracy, the crossentropy is 0.05
If we take the formula: CE + lambda log nb non null params
A good lambda is equal to 100. (Equalizing crossentropy and regularization)
In the mnist minimal number of weights competition, we have 99 percents accuracy with 2000 weights. So lambda is equal to 80.
Maybe If we want to stress the importance of sparsity, we can choose a lambda equal to 300.
Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters.
On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist.
I think that a too large choice of lambda is better than a too small one because if lambda is too big the results will still be interesting (which model architecture is the best under extremely strong regularization?) while if it is too small you will just get a normal architecture slightly more regularized.