A downside of the portfolio approach to AI safety research
Given typical human biases, researchers of each AI safety approach are likely to be overconfident about the difficulty and safety of the approach they’re personally advocating and pursuing, which exacerbates the problem of unilateralist’s curse in AI. This should highlighted and kept in mind by practitioners of the portfolio approach to AI safety research (e.g., grant makers). In particular it may be a good idea to make sure researchers who are being funded have a good understanding of the overconfidence effect and other relevant biases, as well as the unilateralist’s curse.
If “AI safety” refers here only to AI alignment, I’d be happy to read about how overconfidence about the difficulty/safety of one’s approach might exacerbate the unilateralist’s curse.
A downside of the portfolio approach to AI safety research
Given typical human biases, researchers of each AI safety approach are likely to be overconfident about the difficulty and safety of the approach they’re personally advocating and pursuing, which exacerbates the problem of unilateralist’s curse in AI. This should highlighted and kept in mind by practitioners of the portfolio approach to AI safety research (e.g., grant makers). In particular it may be a good idea to make sure researchers who are being funded have a good understanding of the overconfidence effect and other relevant biases, as well as the unilateralist’s curse.
These biases seem very important to keep in mind!
If “AI safety” refers here only to AI alignment, I’d be happy to read about how overconfidence about the difficulty/safety of one’s approach might exacerbate the unilateralist’s curse.