I’m sorry it sounded like a dig at CHAI’s work, and you’re right that “typically described” is at best a generalization over too many people, and worst, wrong. It would be more accurate to say that when people describe IRL, I get the feeling that it’s nearly complete—I don’t think I’ve seen anyone presenting an idea about IRL flag the concern that the issue of recognizing the demonstrator’s action might jeopardizing the whole thing.
I did intend to cast some doubt on whether the IRL research agenda is promising, and whether inferring a utility function from a human’s actions instead of from a reward signal gets us any closer to safety, but I’m sorry to have misrepresented views. (And maybe it’s worth mentioning that I’m fiddling with something that bears strong resemblance to Inverse Reward Design, so I’m definitely not that bearish on the whole idea).
I’m sorry it sounded like a dig at CHAI’s work, and you’re right that “typically described” is at best a generalization over too many people, and worst, wrong. It would be more accurate to say that when people describe IRL, I get the feeling that it’s nearly complete—I don’t think I’ve seen anyone presenting an idea about IRL flag the concern that the issue of recognizing the demonstrator’s action might jeopardizing the whole thing.
I did intend to cast some doubt on whether the IRL research agenda is promising, and whether inferring a utility function from a human’s actions instead of from a reward signal gets us any closer to safety, but I’m sorry to have misrepresented views. (And maybe it’s worth mentioning that I’m fiddling with something that bears strong resemblance to Inverse Reward Design, so I’m definitely not that bearish on the whole idea).