Apply to the “Team Shard” mentorship program at MATS
In the shard theory stream, we create qualitatively new methods and fields of inquiry, from steering vectors to gradient routing[1] to unsupervised capability elicitation. If you’re theory-minded, maybe you’ll help us formalize shard theory itself.
Research areas
Discovering qualitatively new techniques
Steering GPT-2-XL by adding an activation vector opened up a new way to cheaply steer LLMs at runtime. Additionalwork has reinforced the promise of this technique, and steering vectors have become a small research subfield of their own. Unsupervised discovery of model behaviors may now be possible thanks to Andrew Mack’s method for unsupervised steering vector discovery. Gradient routing (forthcoming) potentially unlocks the ability to isolate undesired circuits to known parts of the network, after which point they can be ablated or studied.
What other subfields can we find together?
Formalizing shard theory
Shard theory has helped unlock a range of empirical insights, including steering vectors. The time seems ripe to put the theory on firmer mathematical footing. For initial thoughts, see this comment.
Apply to the “Team Shard” mentorship program at MATS
Research areas
Apply here. Applications due by October 13th!
Paper available soon.