There is a case that aligned AI doesn’t have to be competitive with unaligned AI, it just has to be much better than humans at alignment research. Because, if this holds, then we can delegate the rest of the problem to the AI.
I don’t find this at all reassuring, because by the same construction your aligned-alignment-researcher-AI may now be racing an unaligned-capabilities-researcher-AI, and my intuition is that the latter is an easier problem because you don’t have to worry about complexity of value or anything, just make the loss/Tflop go down.
If equally advanced unaligned AI is deployed earlier than aligned AI, then we might be screwed anyway. My point is, if aligned AI is deployed earlier by a sufficient margin, then it can bootstrap itself to an effective anti-unaligned-AI shield in time.
I don’t find this at all reassuring, because by the same construction your aligned-alignment-researcher-AI may now be racing an unaligned-capabilities-researcher-AI, and my intuition is that the latter is an easier problem because you don’t have to worry about complexity of value or anything, just make the loss/Tflop go down.
If equally advanced unaligned AI is deployed earlier than aligned AI, then we might be screwed anyway. My point is, if aligned AI is deployed earlier by a sufficient margin, then it can bootstrap itself to an effective anti-unaligned-AI shield in time.