Hello! A great write-up and fascinating investigation. Well done with such a great result from a hackathon.
I’m trying to understand your plot titled ‘Proportion of Top Predictions that are ” an” by Layer 31 Neuron 892 Activation’. Can you explain what the y-axis is in this plot? It’s not clear what the y-axis is a proportion of.
I read through the code, but couldn’t quite follow the logic for this plot. It seems that the y-axis is computed with these lines;
neuron_act_top_pred_proportions = [dict(sorted([(k / bin_granularity, v["top_pred"] / v["count"])
for k, v in logit_bins.items()])) for logit_bins in logit_diff_bins.values()]
But I’m not sure what the numerator v["count"] from within logit_bins corresponds to.
For each token prediction we record the activation of the neuron and whether on not ” an” has a greater logit than any other token (if it was the top prediction).
We group the activations into buckets of width 0.2. For each bucket we plot
Number of times ‘‘an" was the top prediction (for activations in this bucket)Number of activations in this bucket
Hello! A great write-up and fascinating investigation. Well done with such a great result from a hackathon.
I’m trying to understand your plot titled ‘Proportion of Top Predictions that are ” an” by Layer 31 Neuron 892 Activation’. Can you explain what the y-axis is in this plot? It’s not clear what the y-axis is a proportion of.
I read through the code, but couldn’t quite follow the logic for this plot. It seems that the y-axis is computed with these lines;
But I’m not sure what the numerator
v["count"]
from withinlogit_bins
corresponds to.Thank you :)
Aaron
Hi!
For each token prediction we record the activation of the neuron and whether on not ” an” has a greater logit than any other token (if it was the top prediction).
We group the activations into buckets of width 0.2. For each bucket we plot
Number of times ‘‘ an" was the top prediction (for activations in this bucket)Number of activations in this bucket
Does that clarify things for you?