With features that fire less frequently this won’t be enough, but for these we seemed to find some activations (if not highly activating) for all features.
On wandb, the dashboards were randomly sampled but we’ve since uploaded all features to Neuronpedia https://www.neuronpedia.org/gpt2-small/res-jb. The log sparsity is stored in the huggingface repo so you can look for the most sparse features and check if their dashboards are empty or not (anecdotally most dashboards seem good, beside the dead neurons in the first 4 layers).
24, 576 prompts of length 128 = 3, 145, 728.
With features that fire less frequently this won’t be enough, but for these we seemed to find some activations (if not highly activating) for all features.
For the dashboards, did you filter out the features that fire less frequently? I looked through a few and didn’t notice any super low density ones.
On wandb, the dashboards were randomly sampled but we’ve since uploaded all features to Neuronpedia https://www.neuronpedia.org/gpt2-small/res-jb. The log sparsity is stored in the huggingface repo so you can look for the most sparse features and check if their dashboards are empty or not (anecdotally most dashboards seem good, beside the dead neurons in the first 4 layers).