I’d be interested in a comparison with the Latest tab.
Transformers Represent Belief State Geometry in their Residual Stream: 6
D&D.Sci: −5
Open Thread Spring 2024: 3
Introducing AI Lab Watch: −3
An explanation of evil in an organized world: −3
Mechanistically Eliciting Latent Behaviors in Language Models: 3
Shane Legg’s necessary properties for every AGI Safety plan: −1
LessWrong Community Weekend 2024, open for applications: −6
Ironing Out the Squiggles: 5
ACX Covid Origins Post convinced readers: −7
Why I’m doing PauseAI: −2
Manifund Q1 Retro: Learnings from impact certs: −1
Questions for labs: −3
Refusal in LLMs is mediated by a single direction: 5
Take SCIFs, it’s dangerous to go alone: 4
I’d be interested in a comparison with the Latest tab.
Transformers Represent Belief State Geometry in their Residual Stream: 6
D&D.Sci: −5
Open Thread Spring 2024: 3
Introducing AI Lab Watch: −3
An explanation of evil in an organized world: −3
Mechanistically Eliciting Latent Behaviors in Language Models: 3
Shane Legg’s necessary properties for every AGI Safety plan: −1
LessWrong Community Weekend 2024, open for applications: −6
Ironing Out the Squiggles: 5
ACX Covid Origins Post convinced readers: −7
Why I’m doing PauseAI: −2
Manifund Q1 Retro: Learnings from impact certs: −1
Questions for labs: −3
Refusal in LLMs is mediated by a single direction: 5
Take SCIFs, it’s dangerous to go alone: 4