This is fascinating, because I took the exact same section to mean almost the opposite thing. I took him to focus on making it not a black-box process and not about training but design of a review process that explicitly states the model’s reasoning, and is subject to external human review.
He states elsewhere in the interview that RLHF might be slightly helpful, but isn’t enough to pin alignment hopes on.
One reason I’m taking this interpretation is that I think DeepMind’s core beliefs about intelligence are very different from OpenAIs, even though they’ve done and are probably doing similar work focused on large training runs. DeepMind initially was working on building an artificial brain, and they pivoted to large training runs in simulated (game) environments as a practical move to demonstrate advances and get funding. I think at least Legg and Hassabis still believe that loosely emulating the brain is an interesting and productive thing to do.
This is fascinating, because I took the exact same section to mean almost the opposite thing. I took him to focus on making it not a black-box process and not about training but design of a review process that explicitly states the model’s reasoning, and is subject to external human review.
He states elsewhere in the interview that RLHF might be slightly helpful, but isn’t enough to pin alignment hopes on.
One reason I’m taking this interpretation is that I think DeepMind’s core beliefs about intelligence are very different from OpenAIs, even though they’ve done and are probably doing similar work focused on large training runs. DeepMind initially was working on building an artificial brain, and they pivoted to large training runs in simulated (game) environments as a practical move to demonstrate advances and get funding. I think at least Legg and Hassabis still believe that loosely emulating the brain is an interesting and productive thing to do.