For 2, I think a lot of it is finding the “sharp left turn” idea unlikely. I think trying to get agreement on that question would be valuable.
For 4, some of the arguments for it in this post (and comments) may help.
For 3, I’d be interested in there being some more investigation into and explanation of what “interpretability” is supposed to achieve (ideally with some technical desiderata). I think this might end up looking like agency foundations if done right.
For example, I’m particularly interested in how “interpretability” is supposed to work if, in some sense, much of the action of planning and achieving some outcome occurs far away from the code or neural network that played some role in precipitating it. E.g., one NN-based system convinces another more capable system to do something (including figuring out how); or an AI builds some successor AIs that go on to do most of the thinking required to get something done. What should “interpretability” do for us in these cases, assuming we only have access to the local system?
For 2, I think a lot of it is finding the “sharp left turn” idea unlikely. I think trying to get agreement on that question would be valuable.
For 4, some of the arguments for it in this post (and comments) may help.
For 3, I’d be interested in there being some more investigation into and explanation of what “interpretability” is supposed to achieve (ideally with some technical desiderata). I think this might end up looking like agency foundations if done right.
For example, I’m particularly interested in how “interpretability” is supposed to work if, in some sense, much of the action of planning and achieving some outcome occurs far away from the code or neural network that played some role in precipitating it. E.g., one NN-based system convinces another more capable system to do something (including figuring out how); or an AI builds some successor AIs that go on to do most of the thinking required to get something done. What should “interpretability” do for us in these cases, assuming we only have access to the local system?