TurnTrout comments on Work dumber not smarter

TurnTrout 7 Jun 2023 2:06 UTC
16 points
1
“Just take a love-minus-hate activation and add that to the prompt activation” sounds like an absolute newb idea. I like that idea but if I were trying to find the expert in a room then that statement would’ve disqualified them.
Very true.
1. I was a literal mechint newb when I had that idea in January. Luckily transformer_lens doesn’t work on convolutional networks, and so I had to come up with my own ideas.
2. I also had never done much ML engineering and had just learned PyTorch and was just then getting into the weeds of ML empirics after MLAB in January.
3. I remember telling another researcher about this activation addition idea, before we had any results. They (AFAICT dryly) replied, “That does sound like a first thing you could try...”. I remember feeling flustered and thinking “yeah, whatever man, this is going to work, I don’t feel like flustering about this so I’ll just go run the experiment and show you it later.”