Eliezer Yudkowsky comments on A toy model of the control problem

Eliezer Yudkowsky 18 Sep 2015 19:57 UTC
4 points
0
I assume the point of the toy model is to explore corrigibility or other mechanisms that are supposed to kick in after A and B end up not perfectly value-aligned, or maybe just to show an example of why a non-value-aligning solution for A controlling B might not work, or maybe specifically to exhibit a case of a not-perfectly-value-aligned agent manipulating its controller.