And vice-versa: transfer Gato to the new task, and finetune and sparsify/distill (eg turn the Transformer into a RNN, or do training with Transformer-XL instead of just runtime) when a task becomes common enough to justify the amortized expense.
Current theme: default
Less Wrong (text)
Less Wrong (link)
And vice-versa: transfer Gato to the new task, and finetune and sparsify/distill (eg turn the Transformer into a RNN, or do training with Transformer-XL instead of just runtime) when a task becomes common enough to justify the amortized expense.