Two other projects I would find interesting to work on:
Causal Scrubbing to remove specific capabilities from a model. For example, training a language model on The Pile and a code dataset. Then, applying causal scrubbing to try and remove the model’s ability to generate code while still achieving the similar loss on The Pile.
Two other projects I would find interesting to work on:
Causal Scrubbing to remove specific capabilities from a model. For example, training a language model on The Pile and a code dataset. Then, applying causal scrubbing to try and remove the model’s ability to generate code while still achieving the similar loss on The Pile.
A few people have started extending the work from the Discovering Latent Knowledge in Language Models without Supervision paper. I think this work could potentially evolve into a median-case solution to avoiding x-risk from AI.