Stuart_Armstrong comments on Biased reward-learning in CIRL

Stuart_Armstrong 16 Nov 2021 8:45 UTC
3 points
I see this issue as being a fundamental problem:

https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into

https://www.lesswrong.com/posts/k54rgSg7GcjtXnMHX/model-splintering-moving-from-one-imperfect-model-to-another-1

https://www.lesswrong.com/posts/3e6pmovj6EJ729M2i/general-alignment-plus-human-values-or-alignment-via-human
What links here?
- Evan R. Murphy's comment on Biased reward-learning in CIRL by Stuart_Armstrong (16 Nov 2021 8:41 UTC; 1 point)