CIRL, or similar procedures, rely on having a satisfactory model of how the human’s preferences ultimately relate to real-world observations. We do not have this. Also, the inference process scales impractically as you make the environment bigger and longer-running. So even if you like CIRL (which I do), it’s not a solution, it’s a first step in direction that has lots of unsolved problems.
CIRL lacks many properties that have been proposed as corrigibility goals. But I just want an AI that does good things and not bad things. Fully updated deference is not a sine qua non. (Though other people are probably more attached to it than I.)
CIRL, or similar procedures, rely on having a satisfactory model of how the human’s preferences ultimately relate to real-world observations. We do not have this. Also, the inference process scales impractically as you make the environment bigger and longer-running. So even if you like CIRL (which I do), it’s not a solution, it’s a first step in direction that has lots of unsolved problems.
CIRL lacks many properties that have been proposed as corrigibility goals. But I just want an AI that does good things and not bad things. Fully updated deference is not a sine qua non. (Though other people are probably more attached to it than I.)