Daniel Kokotajlo comments on A dilemma for prosaic AI alignment

Daniel Kokotajlo 19 Dec 2019 8:07 UTC
LW: 3 AF: 2
AF
That sounds safer, but is it competitive? Would AlphaStar be close to as good as it is, if it had been simultaneously trained to answer questions?
- paulfchristiano 19 Dec 2019 20:26 UTC
  LW: 7 AF: 4
  AF Parent
  We could also ask: “Would AlphaStar remain as good as it is, if fine-tuned to answer questions?”
  In either case it’s an empirical question. I think the answer is probably yes if you do it carefully.
  You could imagine separating this into two questions:
  - Is there a policy that plays starcraft and answers questions, that is only slightly larger than a policy for playing starcraft alone? This is a key premise for the whole project. I think it’s reasonably likely; the goal is only to answer questions the model “already knows,” so it seems realistic to hope for only a constant amount of extra work to be able to use that knowledge to answer questions. I think most of the uncertainty here is about details of “know” and question-answering and so on.
  - Can you use joint optimization to find that policy with only slightly more training time? I think probably yes.
  - Daniel Kokotajlo 19 Dec 2019 21:05 UTC
    LW: 1 AF: 1
    AF Parent
    OK, thanks! I’m pleased to see this and other empirical premises explicitly laid out. It means we as a community are making predictions about the future based on models which can be tested before it’s too late, and perhaps even now.