AI happening through deep learning at all is a huge update against alignment success, because deep learning is incredibly opaque
This is wrong, and this disagreement is at a very deep level why I think on the object level that LW was wrong.
AIs are white boxes, not black boxes, because we have full read-write access to their internals, which is partially why AI is so effective today. We are the innate reward system, which already aligns our brain to survival and critically doing all of this with almost no missteps, and the missteps aren’t very severe.
The meme of AI as black box needs to die.
These posts can help you get better intuitions, at least:
The fact that we have access to AI internals does not mean we understand them. We refer to them as black boxes because we do not understand how their internals produce their answers; this is, so to speak, opaque to us.
I want to note that this part:
This is wrong, and this disagreement is at a very deep level why I think on the object level that LW was wrong.
AIs are white boxes, not black boxes, because we have full read-write access to their internals, which is partially why AI is so effective today. We are the innate reward system, which already aligns our brain to survival and critically doing all of this with almost no missteps, and the missteps aren’t very severe.
The meme of AI as black box needs to die.
These posts can help you get better intuitions, at least:
https://forum.effectivealtruism.org/posts/JYEAL8g7ArqGoTaX6/ai-pause-will-likely-backfire#White_box_alignment_in_nature
The fact that we have access to AI internals does not mean we understand them. We refer to them as black boxes because we do not understand how their internals produce their answers; this is, so to speak, opaque to us.