Not to derail on details, but what would it mean to solve alignment?
To me “solve” feels overly binary and final compared to the true challenge of alignment. Like, would solving alignment mean:
someone invents and implements a system that causes all AIs to do what their developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time, and that AI and its descendants are always more powerful than other AIs for the rest of history?
ditto but 99.999%?
ditto but 99%?
And there any distinction between an AI that is misaligned by mistake (e.g. thinks I’ll want vanilla but really I want chocolate) vs knowingly misaligned (e.g., gives me vanilla knowing i want chocolate so it can achieve its own ends)?
I’m really not sure which you mean, which makes it hard for me to engage with your question.
Not to derail on details, but what would it mean to solve alignment?
To me “solve” feels overly binary and final compared to the true challenge of alignment. Like, would solving alignment mean:
someone invents and implements a system that causes all AIs to do what their developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time?
someone invents and implements a system that causes a single AI to do what its developer wants 100% of the time, and that AI and its descendants are always more powerful than other AIs for the rest of history?
ditto but 99.999%?
ditto but 99%?
And there any distinction between an AI that is misaligned by mistake (e.g. thinks I’ll want vanilla but really I want chocolate) vs knowingly misaligned (e.g., gives me vanilla knowing i want chocolate so it can achieve its own ends)?
I’m really not sure which you mean, which makes it hard for me to engage with your question.