Algon comments on Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example

Algon 28 Nov 2023 19:05 UTC
2 points
0
Run them on examples such as frown-with-red-bar and smile-with-blue-bar.
That sounds like a black-box approach.
Which problems are you thinking of?
Human’s not knowing what goals we want AI to have and the riggability of the reward learning process. Which you stated were problems for CIRL in 2020.