Bit late, but running the same experiment with 1000 dimensions instead of 16, and 10k steps instead of 5k gives
Which appears to be on the way to a minima. though I’m unsure if I should tweak hparams when scaling up this much. Trying with other optimizers would be interesting too, but I think I’ve got nerdsniped by this too much already… Code is here.
Bit late, but running the same experiment with 1000 dimensions instead of 16, and 10k steps instead of 5k gives
Which appears to be on the way to a minima. though I’m unsure if I should tweak hparams when scaling up this much. Trying with other optimizers would be interesting too, but I think I’ve got nerdsniped by this too much already… Code is here.