paulfchristiano comments on A dilemma for prosaic AI alignment

paulfchristiano 19 Dec 2019 20:26 UTC
LW: 7 AF: 4
AF
We could also ask: “Would AlphaStar remain as good as it is, if fine-tuned to answer questions?”
In either case it’s an empirical question. I think the answer is probably yes if you do it carefully.
You could imagine separating this into two questions:
- Is there a policy that plays starcraft and answers questions, that is only slightly larger than a policy for playing starcraft alone? This is a key premise for the whole project. I think it’s reasonably likely; the goal is only to answer questions the model “already knows,” so it seems realistic to hope for only a constant amount of extra work to be able to use that knowledge to answer questions. I think most of the uncertainty here is about details of “know” and question-answering and so on.
- Can you use joint optimization to find that policy with only slightly more training time? I think probably yes.
- Daniel Kokotajlo 19 Dec 2019 21:05 UTC
  LW: 1 AF: 1
  AF Parent
  OK, thanks! I’m pleased to see this and other empirical premises explicitly laid out. It means we as a community are making predictions about the future based on models which can be tested before it’s too late, and perhaps even now.