Quick question: could we use the Gato trick of multi-task → single-model in reverse, such that we exclude tasks?
The idea is that we would specifically create training on “bad” tasks, like connect to the internet, or write code, and then build a single model which includes the “good” ones but excludes the “bad” ones.
Based on my understanding of how these things work there’s no sense in which the tasks would be rejected exactly; rather what I imagine is a kind of pathological underperformance. An analogy would be giving GPT-3 catastrophic dyslexia on purpose.
The natural downside is that we will deliberately be building components that are pretty good at bad tasks, which is dangerous. But our bigger problem is failing to screen out bad things and so they happen accidentally, and this feels at first blush like an option for making incremental progress, at least.
Quick question: could we use the Gato trick of multi-task → single-model in reverse, such that we exclude tasks?
The idea is that we would specifically create training on “bad” tasks, like connect to the internet, or write code, and then build a single model which includes the “good” ones but excludes the “bad” ones.
Based on my understanding of how these things work there’s no sense in which the tasks would be rejected exactly; rather what I imagine is a kind of pathological underperformance. An analogy would be giving GPT-3 catastrophic dyslexia on purpose.
The natural downside is that we will deliberately be building components that are pretty good at bad tasks, which is dangerous. But our bigger problem is failing to screen out bad things and so they happen accidentally, and this feels at first blush like an option for making incremental progress, at least.