Possibly related but with a slightly different angle, you may have missed my work on trying to formally specify the alignment problem, which is pointing to something similar but arrives at somewhat different results.
Thanks for pointing that out to me; I had not come across your work before! I’ve had a look through your post and I agree that we’re saying similar things. I would say that my ‘Value Definition Problem’ is an (intentionally) vaguer and broader question about what our research program should be—as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:
A must learn the values of H and H must know enough about A to believe A shares H’s values
while my Value Definition Problem is
“Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”
I would say the VDP is about what our ‘guiding principle’ or ‘target’ should be in order to have the best chance of solving the alignment problem. I used Christiano’s ‘intent alignment’ formulation but yours actually fits better with the VDP, I think.
Possibly related but with a slightly different angle, you may have missed my work on trying to formally specify the alignment problem, which is pointing to something similar but arrives at somewhat different results.
Thanks for pointing that out to me; I had not come across your work before! I’ve had a look through your post and I agree that we’re saying similar things. I would say that my ‘Value Definition Problem’ is an (intentionally) vaguer and broader question about what our research program should be—as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:
while my Value Definition Problem is
I would say the VDP is about what our ‘guiding principle’ or ‘target’ should be in order to have the best chance of solving the alignment problem. I used Christiano’s ‘intent alignment’ formulation but yours actually fits better with the VDP, I think.