It’s not that my actions were wrong, it’s that I did them for the wrong reasons, and that really does matter. Under my model, the cognitive causes (e.g. I want to be like EY) of externally visible actions (study math) are very important, because I think that the responsible cognition gets reinforced into my future action-generators.
For example, since I wanted to be like EY, I learned math; since I learned math, I got praised on LessWrong; since I got praised, my social-reward circuitry activated; since the social-reward circuitry activated, credit assignment activates and strengthens all of the antecedent thoughts which I just listed, therefore making me more of the kind of person who does things because he wants to be like EY.
I can write a similar story for doing things because they are predicted to make me more respected. Therefore, over time, I became more of the kind of person who cares about being respected, and not so much about succeeding at alignment or truly becoming stronger.
It’s not that my actions were wrong, it’s that I did them for the wrong reasons, and that really does matter. Under my model, the cognitive causes (e.g. I want to be like EY) of externally visible actions (study math) are very important, because I think that the responsible cognition gets reinforced into my future action-generators.
For example, since I wanted to be like EY, I learned math; since I learned math, I got praised on LessWrong; since I got praised, my social-reward circuitry activated; since the social-reward circuitry activated, credit assignment activates and strengthens all of the antecedent thoughts which I just listed, therefore making me more of the kind of person who does things because he wants to be like EY.
I can write a similar story for doing things because they are predicted to make me more respected. Therefore, over time, I became more of the kind of person who cares about being respected, and not so much about succeeding at alignment or truly becoming stronger.