I’ve seen Eliezer Yudkowsky claim that we don’t need to worry about s-risks from AI, since the Alignment Problem would need to be something like 95% solved in order for s-risks to crop up in a worryingly-large number of a TAI’s failure modes: a threshold he thinks we are nowhere near crossing. If this is true, it seems to carry the troubling implication that alignment research could be net-negative, conditional on how difficult it will be for us to conquer that remaining 5% of the Alignment Problem in the time we have.
So is there any work being done on figuring out where that threshold might be, after which we need to worry about s-risks from TAI? Should this line of reasoning have policy implications, and is this argument about an “s-risk threshold” largely accepted?
I’ve seen Eliezer Yudkowsky claim that we don’t need to worry about s-risks from AI, since the Alignment Problem would need to be something like 95% solved in order for s-risks to crop up in a worryingly-large number of a TAI’s failure modes: a threshold he thinks we are nowhere near crossing. If this is true, it seems to carry the troubling implication that alignment research could be net-negative, conditional on how difficult it will be for us to conquer that remaining 5% of the Alignment Problem in the time we have.
So is there any work being done on figuring out where that threshold might be, after which we need to worry about s-risks from TAI? Should this line of reasoning have policy implications, and is this argument about an “s-risk threshold” largely accepted?