“Just take a love-minus-hate activation and add that to the prompt activation” sounds like an absolute newb idea. I like that idea but if I were trying to find the expert in a room then that statement would’ve disqualified them.
Very true.
I was a literal mechint newb when I had that idea in January. Luckily transformer_lens doesn’t work on convolutional networks, and so I had to come up with my own ideas.
I also had never done much ML engineering and had just learned PyTorch and was just then getting into the weeds of ML empirics after MLAB in January.
I remember telling another researcher about this activation addition idea, before we had any results. They (AFAICT dryly) replied, “That does sound like a first thing you could try...”. I remember feeling flustered and thinking “yeah, whatever man, this is going to work, I don’t feel like flustering about this so I’ll just go run the experiment and show you it later.”
Very true.
I was a literal mechint newb when I had that idea in January. Luckily
transformer_lens
doesn’t work on convolutional networks, and so I had to come up with my own ideas.I also had never done much ML engineering and had just learned PyTorch and was just then getting into the weeds of ML empirics after MLAB in January.
I remember telling another researcher about this activation addition idea, before we had any results. They (AFAICT dryly) replied, “That does sound like a first thing you could try...”. I remember feeling flustered and thinking “yeah, whatever man, this is going to work, I don’t feel like flustering about this so I’ll just go run the experiment and show you it later.”