I’m currently ruminating on the idea of doing a video series in which I review code repositories that are highly relevant to alignment research to make them more accessible.
I do want to pick out repos with perhaps even bad documentation that are still useful and then hope on a call with the author to go over the repo and record it. At least have something basic to use when navigating the repo.
This means there would be two levels: 1) an overview with the author sharing at least the basics, and 2) a deep dive going over most of the code. The former likely contains most of the value (lower effort for me, still gets done, better than nothing, points to repo as a selection mechanism, people can at least get started).
I am thinking of doing this because I think there may be repositories that are highly useful for new people but would benefit from some direction. For example, I think Karpathy and Neel Nanda’s videos have been useful in getting people started. In particular, Karpathy saw OOM more stars to his repos (e.g. nanoGPT) after the release of his videos (which, to be fair, he’s famous, and a number of stars is definitely not a perfect proxy for usage).
I’m interested in any feedback (“you should do it like x”, “this seems low value for x, y, z reasons so you shouldn’t do it”, “this seems especially valuable only if x”, etc.).
I love this idea! I don’t actually like videos, preferring searchable, exerptable text, but I may not be typical and there’s room for all. At first glance, I agree with your guess that the overview/intro is more value per effort (for you and for consumers, IMO) than a deep-dive into the code. There IS probably a section of code or core modeling idea for each where it would be worth going half-deep into (algorithm and usage, not necessarily line-by-line).
Note that this list is itself incredibly valuable, and you might start with an intro video (and associated text) that spends 1 minute on each and why you’re planning to do it, and what you currently think will be the most important intro concept(s) for each.
I’m currently ruminating on the idea of doing a video series in which I review code repositories that are highly relevant to alignment research to make them more accessible.
I do want to pick out repos with perhaps even bad documentation that are still useful and then hope on a call with the author to go over the repo and record it. At least have something basic to use when navigating the repo.
This means there would be two levels: 1) an overview with the author sharing at least the basics, and 2) a deep dive going over most of the code. The former likely contains most of the value (lower effort for me, still gets done, better than nothing, points to repo as a selection mechanism, people can at least get started).
I am thinking of doing this because I think there may be repositories that are highly useful for new people but would benefit from some direction. For example, I think Karpathy and Neel Nanda’s videos have been useful in getting people started. In particular, Karpathy saw OOM more stars to his repos (e.g. nanoGPT) after the release of his videos (which, to be fair, he’s famous, and a number of stars is definitely not a perfect proxy for usage).
I’m interested in any feedback (“you should do it like x”, “this seems low value for x, y, z reasons so you shouldn’t do it”, “this seems especially valuable only if x”, etc.).
Here are some of the repos I have in mind so far:
Release Ordering
Evalugator
Sleeper Agents
How to remove Sleeper Agents
Open Source replication
Weak-to-Strong Generalization
Neuron-pedia (I think they have GIFs right now)
ELK
ELK Generalization
Potential Videos
Localizing Lying in Llama
Representation Engineering
Sparse Autoencoders
nnsight
TransformerLens
Anthropic
Model-Written Evals
Ethan’s faithfulness approach
Influence Functions
DeepMind
None seem interesting so far.
EleutherAI
LM Harness
Semantic Memorization
How to use Pythia
Weak-to-Strong replication
Concept Erasure
Features Across Time
OpenAI
Evals
Triton introduction
Transformer Debugger (OpenAI has videos)
Influence Functions (when there is a legitimate repo)
Recommended by Garett Baker
Devinterp
procgenAISC and procgen-tools
I love this idea! I don’t actually like videos, preferring searchable, exerptable text, but I may not be typical and there’s room for all. At first glance, I agree with your guess that the overview/intro is more value per effort (for you and for consumers, IMO) than a deep-dive into the code. There IS probably a section of code or core modeling idea for each where it would be worth going half-deep into (algorithm and usage, not necessarily line-by-line).
Note that this list is itself incredibly valuable, and you might start with an intro video (and associated text) that spends 1 minute on each and why you’re planning to do it, and what you currently think will be the most important intro concept(s) for each.