The claim that “There are three kinds of genies: Genies to whom you can safely say ‘I wish for you to do what I should wish for’; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.” only seems true under a very conservative notion of what it means for a wish to be “safe” (which may be appropriate in some cases). It’s a very black-and-white account—certainly there ought to be a continuum of genies with different safety/performance trade-offs resulting from their varying capabilities and alignment properties.
The final 3 paragraphs of the linked post on Artificial Addition seem to suggest that deep learning-style approaches to teaching AI systems arithmetic are not promising. I also recall that EY and others thought deep learning wouldn’t work for capabilities, either. The argument that deep learning won’t work for capabilities has mostly been falsified. It seems like the same argument was being used to illustrate a core alignment difficulty in this post, but it’s not entirely clear to me.
I think a big portion of why Eliezer was wrong about deep learning is that inductive biases turned out to matter less than we thought, and I think the central story of the deep learning era is that data/compute mattered a lot more than hand crafted inductive biases.
Heck, even the statistical learning community mostly got it wrong.
This update is a central portion/crux of why I have become way more optimistic on AI alignment being achievable in a relatively easy way compared to the LW community, and why AI alignment failures are not my modal instance of catastrophe anymore.
I also agree with the fact that the genies should be ranked on a continuum, not in discrete ways.
Two things that strike me:
The claim that “There are three kinds of genies: Genies to whom you can safely say ‘I wish for you to do what I should wish for’; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.” only seems true under a very conservative notion of what it means for a wish to be “safe” (which may be appropriate in some cases). It’s a very black-and-white account—certainly there ought to be a continuum of genies with different safety/performance trade-offs resulting from their varying capabilities and alignment properties.
The final 3 paragraphs of the linked post on Artificial Addition seem to suggest that deep learning-style approaches to teaching AI systems arithmetic are not promising. I also recall that EY and others thought deep learning wouldn’t work for capabilities, either. The argument that deep learning won’t work for capabilities has mostly been falsified. It seems like the same argument was being used to illustrate a core alignment difficulty in this post, but it’s not entirely clear to me.
I think a big portion of why Eliezer was wrong about deep learning is that inductive biases turned out to matter less than we thought, and I think the central story of the deep learning era is that data/compute mattered a lot more than hand crafted inductive biases.
Heck, even the statistical learning community mostly got it wrong.
This update is a central portion/crux of why I have become way more optimistic on AI alignment being achievable in a relatively easy way compared to the LW community, and why AI alignment failures are not my modal instance of catastrophe anymore.
I also agree with the fact that the genies should be ranked on a continuum, not in discrete ways.