I basically agree with your main point (and I didn’t mean to suggest that it “[makes] sense to just decide whether CRT seems more true or false and then go from there”).
But I think it’s also suggestive of an underlying view that I disagree with, namely: (1) “we should aim for high-confidence solutions to AI-Xrisk”. I think this is a good heuristic, but from a strategic point of view, I think what we should be doing is closer to: (2) “aim to maximize the rate of Xrisk reduction”.
Practically speaking, a big implication of favoring (2) over (1) is giving a relatively higher priority to research at making unsafe-looking approaches (e.g. reward modelling + DRL) safer (in expectation).
I basically agree with your main point (and I didn’t mean to suggest that it “[makes] sense to just decide whether CRT seems more true or false and then go from there”).
But I think it’s also suggestive of an underlying view that I disagree with, namely: (1) “we should aim for high-confidence solutions to AI-Xrisk”. I think this is a good heuristic, but from a strategic point of view, I think what we should be doing is closer to: (2) “aim to maximize the rate of Xrisk reduction”.
Practically speaking, a big implication of favoring (2) over (1) is giving a relatively higher priority to research at making unsafe-looking approaches (e.g. reward modelling + DRL) safer (in expectation).