I’m not sure why Rohin thinks the arguments against CIRL are bad, but I wrote a post today on why I think the argument from fully updated deference / corrigibility is weak. I also found Paul Christiano’s response very helpful as an outline of objections to the utility uncertainty agenda.
Also relevant is this old comment from Rohin on difficulties with utility uncertainty.
I also just remembered this comment, which is more recent and has more details. Also I agree with Paul’s response.
I’m not sure why Rohin thinks the arguments against CIRL are bad, but I wrote a post today on why I think the argument from fully updated deference / corrigibility is weak. I also found Paul Christiano’s response very helpful as an outline of objections to the utility uncertainty agenda.
Also relevant is this old comment from Rohin on difficulties with utility uncertainty.
I also just remembered this comment, which is more recent and has more details. Also I agree with Paul’s response.