Thinking about the sort of tasks current models seem good at, it seems like translation and interpolation / remixing seem like pretty solid areas. If I were to design an AI assistant to help with alignment research, I think I’d focus on questions of these sorts to start with.
Translation: take this ML interpretability paper on CNNs and make it work for Transformers instead
Interpolation: take these two (or more) ML interpretability papers and give me a technique that does something like a cross between them.
AI-alignment-assistant-model tasks
Thinking about the sort of tasks current models seem good at, it seems like translation and interpolation / remixing seem like pretty solid areas. If I were to design an AI assistant to help with alignment research, I think I’d focus on questions of these sorts to start with.
Translation: take this ML interpretability paper on CNNs and make it work for Transformers instead
Interpolation: take these two (or more) ML interpretability papers and give me a technique that does something like a cross between them.