If we’re nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don’t understand why anyone is advocating further AI research at this point.
Also, ‘avoiding deceptive alignment’ doesn’t really mean anything if we don’t have a relatively rich and detailed description of what ‘authentic alignment’ with human values would look like.
I’m truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we’re allegedly aligning with.
I agree with this. What if it was actually possible to formalize morality? (Cf «Boundaries» for formalizing an MVP morality.) Inner alignment seems like it would be a lot easier with a good outer alignment function!
Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has.
You’re right though that AI capabilities will need to slow down, and I am not hopeful here.
Primarily because right now, we’re not even close to that goal. We’re trying to figure out how to avoid deceptive alignment right now.
If we’re nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don’t understand why anyone is advocating further AI research at this point.
Also, ‘avoiding deceptive alignment’ doesn’t really mean anything if we don’t have a relatively rich and detailed description of what ‘authentic alignment’ with human values would look like.
I’m truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we’re allegedly aligning with.
I agree with this. What if it was actually possible to formalize morality? (Cf «Boundaries» for formalizing an MVP morality.) Inner alignment seems like it would be a lot easier with a good outer alignment function!
Mostly because ambitious value learning is really fucking hard, and this proposal falls into all the problems that ambitious or narrow value learning has.
You’re right though that AI capabilities will need to slow down, and I am not hopeful here.