You’re right that I think this is more useful as an unscientific way for (probably less technical governance and strategy people) to orientate towards AI alignment than for actually carving up reality. I wrote the post with that audience and that framing in mind. By the same logic, your chart of how difficult various injuries and diseases are to fix would be very useful e.g. as a poster in a military triage tent even if it isn’t useful for biologists or trained doctors.
However, while I didn’t explore the idea much I do think that it is possible to cash this scale out as an actual variable related to system behavior, something along the lines of ‘how adversarial are systems/how many extra bits of optimization over and above behavioral feedback are needed’. See here for further discussion on that. Evan Hubinger also talked in a bit more detail about what might be computationally different about ML models in low vs high adversarialness worlds here.
You’re right that I think this is more useful as an unscientific way for (probably less technical governance and strategy people) to orientate towards AI alignment than for actually carving up reality. I wrote the post with that audience and that framing in mind. By the same logic, your chart of how difficult various injuries and diseases are to fix would be very useful e.g. as a poster in a military triage tent even if it isn’t useful for biologists or trained doctors.
However, while I didn’t explore the idea much I do think that it is possible to cash this scale out as an actual variable related to system behavior, something along the lines of ‘how adversarial are systems/how many extra bits of optimization over and above behavioral feedback are needed’. See here for further discussion on that. Evan Hubinger also talked in a bit more detail about what might be computationally different about ML models in low vs high adversarialness worlds here.