I agree well-calibrated uncertainties are quite valuable, but I’m not convinced they are essential for this sort of application. For example, if my assistant tells me a story about how my proposed FAI could fail, if my assistant is overconfident in its pessimism, then the worst case is that I spend a lot of time thinking about the failure mode without seeing how it could happen (not that bad). If my assistant is underconfident, and tells me a failure mode is 5% likely when it’s really 95% likely, it still feels like my assistant is being overall helpful if the failure case is one I wasn’t previously aware of. To put it another way, if my assistant isn’t calibrated, it feels like I should just be able to ignore its probability estimates and get good use out if it.
but eventually we want to switch over to a more scalable approach that will use few of the same tools.
I actually think the advisor approach might be scaleable, if advisor_1 has been hand-verified, and advisor_1 verifies advisor_2, who verifies advisor_3, etc.
I agree well-calibrated uncertainties are quite valuable, but I’m not convinced they are essential for this sort of application. For example, if my assistant tells me a story about how my proposed FAI could fail, if my assistant is overconfident in its pessimism, then the worst case is that I spend a lot of time thinking about the failure mode without seeing how it could happen (not that bad). If my assistant is underconfident, and tells me a failure mode is 5% likely when it’s really 95% likely, it still feels like my assistant is being overall helpful if the failure case is one I wasn’t previously aware of. To put it another way, if my assistant isn’t calibrated, it feels like I should just be able to ignore its probability estimates and get good use out if it.
I actually think the advisor approach might be scaleable, if advisor_1 has been hand-verified, and advisor_1 verifies advisor_2, who verifies advisor_3, etc.