One important feature of ACE is that it can overcome simplicity bias—even quite strong simplicity bias. In the following example, the labelled data consisted of smiling faces with a red bar under them, and frowning faces with a blue bar under them.
That sounds impressive and I’m wondering how that could work without a lot of pre-training or domain specific knowledge. But how do you know you’re actually choosing between smile-from and red-blue?
Also, this method seems superficially related to CIRL. How does it avoid the associated problems?
*Goodhart
That sounds impressive and I’m wondering how that could work without a lot of pre-training or domain specific knowledge. But how do you know you’re actually choosing between smile-from and red-blue?
Also, this method seems superficially related to CIRL. How does it avoid the associated problems?
Thanks! Corrected (though it is indeed a good hard problem).
Pre-training and domain specific knowledge are not needed.
Run them on examples such as frown-with-red-bar and smile-with-blue-bar.
Which problems are you thinking of?
That sounds like a black-box approach.
Human’s not knowing what goals we want AI to have and the riggability of the reward learning process. Which you stated were problems for CIRL in 2020.