We could also ask: “Would AlphaStar remain as good as it is, if fine-tuned to answer questions?”
In either case it’s an empirical question. I think the answer is probably yes if you do it carefully.
You could imagine separating this into two questions:
Is there a policy that plays starcraft and answers questions, that is only slightly larger than a policy for playing starcraft alone? This is a key premise for the whole project. I think it’s reasonably likely; the goal is only to answer questions the model “already knows,” so it seems realistic to hope for only a constant amount of extra work to be able to use that knowledge to answer questions. I think most of the uncertainty here is about details of “know” and question-answering and so on.
Can you use joint optimization to find that policy with only slightly more training time? I think probably yes.
OK, thanks! I’m pleased to see this and other empirical premises explicitly laid out. It means we as a community are making predictions about the future based on models which can be tested before it’s too late, and perhaps even now.
That sounds safer, but is it competitive? Would AlphaStar be close to as good as it is, if it had been simultaneously trained to answer questions?
We could also ask: “Would AlphaStar remain as good as it is, if fine-tuned to answer questions?”
In either case it’s an empirical question. I think the answer is probably yes if you do it carefully.
You could imagine separating this into two questions:
Is there a policy that plays starcraft and answers questions, that is only slightly larger than a policy for playing starcraft alone? This is a key premise for the whole project. I think it’s reasonably likely; the goal is only to answer questions the model “already knows,” so it seems realistic to hope for only a constant amount of extra work to be able to use that knowledge to answer questions. I think most of the uncertainty here is about details of “know” and question-answering and so on.
Can you use joint optimization to find that policy with only slightly more training time? I think probably yes.
OK, thanks! I’m pleased to see this and other empirical premises explicitly laid out. It means we as a community are making predictions about the future based on models which can be tested before it’s too late, and perhaps even now.