I like the emphasis in this post on the role of patterns in the world in shaping behaviour, the fact that some of those patterns incentivise misaligned behaviour such as deception, and further that our best efforts at alignment and control are themselves patterns that could have this effect. I also like the idea that our control systems (even if obscured from the agent) can present as “errors” with respect to which the agent is therefore motivated to learn to “error correct”.
This post and the sharp left turn are among the most important high-level takes on the alignment problem for shaping my own views on where the deep roots of the problem are.
Although to be honest I had forgotten about this post, and therefore underestimated its influence on me, until performing this review (which caused me to update a recent article I wrote, the Queen’s Dilemma, which is clearly a kind of retelling of one aspect of this story, with an appropriate reference). I assess it to be a substantial influence on me even so.
I think this whole line of thought could be substantially developed, and with less reliance on stories, and that this would be useful.
I like the emphasis in this post on the role of patterns in the world in shaping behaviour, the fact that some of those patterns incentivise misaligned behaviour such as deception, and further that our best efforts at alignment and control are themselves patterns that could have this effect. I also like the idea that our control systems (even if obscured from the agent) can present as “errors” with respect to which the agent is therefore motivated to learn to “error correct”.
This post and the sharp left turn are among the most important high-level takes on the alignment problem for shaping my own views on where the deep roots of the problem are.
Although to be honest I had forgotten about this post, and therefore underestimated its influence on me, until performing this review (which caused me to update a recent article I wrote, the Queen’s Dilemma, which is clearly a kind of retelling of one aspect of this story, with an appropriate reference). I assess it to be a substantial influence on me even so.
I think this whole line of thought could be substantially developed, and with less reliance on stories, and that this would be useful.