For what it’s worth, I am not doing (and have never done) any research remotely similar to your text “maybe we can get really high-quality alignment labels from brain data, maybe we can steer models by training humans to do activation engineering fast and intuitively”.
I have a concise and self-contained summary of my main research project here (Section 2).
For what it’s worth, I am not doing (and have never done) any research remotely similar to your text “maybe we can get really high-quality alignment labels from brain data, maybe we can steer models by training humans to do activation engineering fast and intuitively”.
I have a concise and self-contained summary of my main research project here (Section 2).
I care a lot! Will probably make a section for this in the main post under “Getting the model to learn what we want”, thanks for the correction.