At one point he echoes concerns about future systems based on deep learning that sound faintly similar to those expressed in the Rocket Alignment Problem.
The quoted paragraph does not sound like the Rocket Alignment problem to me. It seems to me that the quoted paragraph is arguing that you need to have systems that are robust, whereas the Rocket Alignment problem argues that you need to have a deep understanding of the systems you build. These are very different: I suspect the vast majority of AI safety researchers would agree that you need robustness, but you can get robustness without understanding, e.g. I feel pretty confident that AlphaZero robustly beats humans at Go, even though I don’t understand what sort of reasoning AlphaZero is doing.
(A counterargument is that we understand how the AlphaZero training algorithm incentivizes robust gameplay, which is what rocket alignment is talking about, but then it’s not clear to me why the rocket alignment analogy implies that we couldn’t ever build aligned AI systems out of deep learning.)
To clarify, I had first read the “the whole point of having knowledge” sentence in light of the fact that he wants to hardcode knowledge into our systems, and from that point of view it made more sense. I am re-reading and it’s not the best comparison admittedly. The rest of the paper still echoes the general vibe of not doing random searches for answers, and leveraging our human understanding to yield some sort of robustness.
The quoted paragraph does not sound like the Rocket Alignment problem to me. It seems to me that the quoted paragraph is arguing that you need to have systems that are robust, whereas the Rocket Alignment problem argues that you need to have a deep understanding of the systems you build. These are very different: I suspect the vast majority of AI safety researchers would agree that you need robustness, but you can get robustness without understanding, e.g. I feel pretty confident that AlphaZero robustly beats humans at Go, even though I don’t understand what sort of reasoning AlphaZero is doing.
(A counterargument is that we understand how the AlphaZero training algorithm incentivizes robust gameplay, which is what rocket alignment is talking about, but then it’s not clear to me why the rocket alignment analogy implies that we couldn’t ever build aligned AI systems out of deep learning.)
To clarify, I had first read the “the whole point of having knowledge” sentence in light of the fact that he wants to hardcode knowledge into our systems, and from that point of view it made more sense. I am re-reading and it’s not the best comparison admittedly. The rest of the paper still echoes the general vibe of not doing random searches for answers, and leveraging our human understanding to yield some sort of robustness.