1. AI alignment is a problem with bimodal outcomes, i.e. most of the probability distribution is clustered around success and failure with very little area under the curve between these outcomes.
2. Thus, all else equal, we would rather be extra cautious and miss some paths to success than be insufficiently cautious and hit a path to failure.
At first I didn’t understand the connection between these two points, but I think I get it now. (And so I’ll spell it out for others who might be like me.)
If there are really only two outcomes and one is much better than the other, then when you’re choosing which path to take you want to maximize the probability that it leads you to the good outcome. (Aka the maxipok rule.)
This would be different from a situation where there’s many different possible outcomes of varying quality. In that case, if you think a path leads you to the best outcome, and you’re wrong, maybe that’s not so bad, because it might still lead you somewhere pretty good. And also if the difference between outcomes is not so great, then maybe you’re more okay selecting your path based in part on how easy or convenient it is.
Whereas back in the bi-modal world, if you have a path that you’re pretty sure leads you to the good outcome, if you’ve got the time you might as well look for a path you can be more confident in.
At first I didn’t understand the connection between these two points, but I think I get it now. (And so I’ll spell it out for others who might be like me.)
If there are really only two outcomes and one is much better than the other, then when you’re choosing which path to take you want to maximize the probability that it leads you to the good outcome. (Aka the maxipok rule.)
This would be different from a situation where there’s many different possible outcomes of varying quality. In that case, if you think a path leads you to the best outcome, and you’re wrong, maybe that’s not so bad, because it might still lead you somewhere pretty good. And also if the difference between outcomes is not so great, then maybe you’re more okay selecting your path based in part on how easy or convenient it is.
Whereas back in the bi-modal world, if you have a path that you’re pretty sure leads you to the good outcome, if you’ve got the time you might as well look for a path you can be more confident in.
(Is that a fair summary, @gworley?)
Thanks, that’s a useful clarification of my reasoning that I did not spell out!
asymmetry in the penalties for type 1 vs type 2 errors.