This paints a bleak picture for the possibility of aligning mindless AGI since behavioral methods of alignment are likely to result in divergence from human values and algorithmic methods are too complex for us to succeed at implementing.
To me it appears like the terms cancel out: Assuming we are able to overcome the difficulties of more symbolic AI design, the prospect of aligning such an AI seem less hard.
In other words, the main risk is wasting effort on alignment strategies that turn out to be mismatched to the eventually implemented AI.
This is actually the opposite of what I argue elsewhere in the paper, preferring to trade off more false negatives for less false positives. That is, I view wasting effort as better than not wasting effort on something that has a higher chance of killing us. You see none of that line of argument here, though, so I agree that’s a reasonable alternative conclusion to draw outside the context of what I’m trying to optimize for.
To me it appears like the terms cancel out: Assuming we are able to overcome the difficulties of more symbolic AI design, the prospect of aligning such an AI seem less hard.
In other words, the main risk is wasting effort on alignment strategies that turn out to be mismatched to the eventually implemented AI.
This is actually the opposite of what I argue elsewhere in the paper, preferring to trade off more false negatives for less false positives. That is, I view wasting effort as better than not wasting effort on something that has a higher chance of killing us. You see none of that line of argument here, though, so I agree that’s a reasonable alternative conclusion to draw outside the context of what I’m trying to optimize for.