Increasing regularization (weight decay in this instance) might rescue the ones which don’t work.
I tried increasing weight decay and increased batch sizes but so far no real success compared to 5x lr. Not going to investigate this further atm.
Increasing regularization (weight decay in this instance) might rescue the ones which don’t work.
I tried increasing weight decay and increased batch sizes but so far no real success compared to 5x lr. Not going to investigate this further atm.