Thanks for writing this post! (And man, if this is you deliberately writing fast and below your standards, you should lower your standards way more!). I very strongly agree with this within mechanistic interpretability and within pure maths (and it seems probably true in ML and in life generally, but those are the two areas I feel vaguely qualified to comment on).
Aversion to Schlepping
Man, I strongly relate to this one… There have been multiple instances of me having an experiment idea I put off for days to weeks, only to do it in 1-3 hours and get really useful results. I’ve had some success experimenting with things like speedrunning afternoons, where I drop all of my ongoing tasks, try to pick a self-contained thing that seems high priority, and sprint on getting it done ASAP (this doesn’t work well for day to week schleppy tasks, but I’m more OK with sucking at those)
Under why touch reality, IMO the most important reason is that it’ll help you form ideas that are good! It’s much much easier to do this when you have a lot of surface area on what’s actually going on, and enough experience and loose threads to spark curiosities and new ideas.
Under why don’t people touch reality, honestly the strongest reason for me is just procrastination/lacking urgency (which is somewhat aversion to schlepping, but less central) - even if I know exactly what it’d be sensible to do, there’s rarely a reason to do it right now rather than later.
Some more strategies I like for touching reality faster (there’s some overlap with your’s):
Try explaining your understanding to other people. Notice when you’re confused about a concept, and go and try to figure out what’s going on (ideally by building some kind of toy model and coding something yourself)
Meta strategy—learn how to use good tooling, debug issues in your workflow, and just practice running a lot of quick experiments. I find that being able to test a hypothesis about GPT-2 Small in a few minutes makes it much easier to touch reality, in a way that I just wouldn’t if it took hours to days. Even if the difference in time isn’t that stark, the more you have the right muscle memory, the lower the activation energy
Try to Murphyjitsu your ideas—assume things will go wrong, or that there’s some crucial flaw in your beliefs, and use your intuition to fill in the blanks re why. Use this to generate ideas to try falsifying your plan
This is probably true in general, to be honest. However, it’s an explanation for why people don’t do anything, and I’m not sure this differentially leads to delaying contact with reality more than say, delaying writing up your ideas in a Google doc.
Some more strategies I like for touching reality faster
I like the “explain your ideas to other people” point, it seems like an important caveat/improvement to the “have good collaborators” strategy I describe above. I also think the meta strategy point of building a good workflow is super important!
I like the “explain your ideas to other people” point, it seems like an important caveat/improvement to the “have good collaborators” strategy I describe above
Importantly, the bar for “good person to explain ideas to” is much lower than the bar for “is a good collaborator”. Finding good collaborators is hard!
Thanks for writing this post! (And man, if this is you deliberately writing fast and below your standards, you should lower your standards way more!). I very strongly agree with this within mechanistic interpretability and within pure maths (and it seems probably true in ML and in life generally, but those are the two areas I feel vaguely qualified to comment on).
Man, I strongly relate to this one… There have been multiple instances of me having an experiment idea I put off for days to weeks, only to do it in 1-3 hours and get really useful results. I’ve had some success experimenting with things like speedrunning afternoons, where I drop all of my ongoing tasks, try to pick a self-contained thing that seems high priority, and sprint on getting it done ASAP (this doesn’t work well for day to week schleppy tasks, but I’m more OK with sucking at those)
Under why touch reality, IMO the most important reason is that it’ll help you form ideas that are good! It’s much much easier to do this when you have a lot of surface area on what’s actually going on, and enough experience and loose threads to spark curiosities and new ideas.
Under why don’t people touch reality, honestly the strongest reason for me is just procrastination/lacking urgency (which is somewhat aversion to schlepping, but less central) - even if I know exactly what it’d be sensible to do, there’s rarely a reason to do it right now rather than later.
Some more strategies I like for touching reality faster (there’s some overlap with your’s):
Try explaining your understanding to other people. Notice when you’re confused about a concept, and go and try to figure out what’s going on (ideally by building some kind of toy model and coding something yourself)
Meta strategy—learn how to use good tooling, debug issues in your workflow, and just practice running a lot of quick experiments. I find that being able to test a hypothesis about GPT-2 Small in a few minutes makes it much easier to touch reality, in a way that I just wouldn’t if it took hours to days. Even if the difference in time isn’t that stark, the more you have the right muscle memory, the lower the activation energy
Try to Murphyjitsu your ideas—assume things will go wrong, or that there’s some crucial flaw in your beliefs, and use your intuition to fill in the blanks re why. Use this to generate ideas to try falsifying your plan
Thanks!
This is probably true in general, to be honest. However, it’s an explanation for why people don’t do anything, and I’m not sure this differentially leads to delaying contact with reality more than say, delaying writing up your ideas in a Google doc.
I like the “explain your ideas to other people” point, it seems like an important caveat/improvement to the “have good collaborators” strategy I describe above. I also think the meta strategy point of building a good workflow is super important!
Importantly, the bar for “good person to explain ideas to” is much lower than the bar for “is a good collaborator”. Finding good collaborators is hard!