I wouldn’t say this is the method we use to align children, for the reaon you point out: we can’t set the motivational valence of the goals we suggest. So I’d call that “’goal suggestion”. The difference in this method is that we are setting the goal value of that representation directly, editing the AGIs weights to do this in a way we can’t with children. It would be like when I say “it’s bad to hit people” I also set the weights into and through the amygdala so that the concept he represents, hitting people, is tied to a very negative reward prediction. That steers his actions away from hitting people.
By selecting a representation, then editing how it connects to a steering subsystem (like the human dopamine system), we are selecting it as a goal directly, not just suggesting it and allowing the system to set its own valance (goal/avoidance marker) for that representation, as we do with human children.
I wouldn’t say this is the method we use to align children, for the reaon you point out: we can’t set the motivational valence of the goals we suggest. So I’d call that “’goal suggestion”. The difference in this method is that we are setting the goal value of that representation directly, editing the AGIs weights to do this in a way we can’t with children. It would be like when I say “it’s bad to hit people” I also set the weights into and through the amygdala so that the concept he represents, hitting people, is tied to a very negative reward prediction. That steers his actions away from hitting people.
By selecting a representation, then editing how it connects to a steering subsystem (like the human dopamine system), we are selecting it as a goal directly, not just suggesting it and allowing the system to set its own valance (goal/avoidance marker) for that representation, as we do with human children.