I would be very surprised if complex high level behavior was mediated strongly by a single neuron due to superposition. Engineering polysemanticity (“making it depend on many different neurons”) feels like the flip side of engineering monosemanticity so you might want to read Adam Jermyn’s post on the topic.
Yep, this sounds like a promising idea. Maybe connected to Christiano’s ELK.
I would be very surprised if complex high level behavior was mediated strongly by a single neuron due to superposition. Engineering polysemanticity (“making it depend on many different neurons”) feels like the flip side of engineering monosemanticity so you might want to read Adam Jermyn’s post on the topic.