I am interested in Alignment, Mechanistic Interpretability, Agents, and the theory of how neural networks work.
Current theme: default
Less Wrong (text)
Less Wrong (link)