I think where I disagree is that I do think value learning and learning about human values is quite obviously very important for alignment, both because a lot of alignment approaches depended on value learning working, and because the data heavily influences what it trys to to optimize.
Another way to say it is that the data strongly influences the optimization target, because a large portion of both capabilities and alignment is strongly downstream of the data it’s trained on, so I don’t see why learning what human values are is unrelated to this:
(as describing an agent’s tendency to optimize for states of the world that humans would find good).
I think where I disagree is that I do think value learning and learning about human values is quite obviously very important for alignment, both because a lot of alignment approaches depended on value learning working, and because the data heavily influences what it trys to to optimize.
Another way to say it is that the data strongly influences the optimization target, because a large portion of both capabilities and alignment is strongly downstream of the data it’s trained on, so I don’t see why learning what human values are is unrelated to this: