Charlie Steiner comments on Investigating the learning coefficient of modular addition: hackathon project

Charlie Steiner 17 Oct 2023 22:47 UTC
LW: 3 AF: 1
0
AF
I’m curious if you have guesses about how many singular dimensions were dead neurons (or neurons that are “mostly dead,” only activating for a tiny fraction of the training set), versus how much the zero-gradient directions depended dynamically on training example.