Gordon Seidoh Worley comments on The Value Definition Problem

Gordon Seidoh Worley 19 Nov 2019 21:39 UTC
LW: 3 AF: 2
AF
Possibly related but with a slightly different angle, you may have missed my work on trying to formally specify the alignment problem, which is pointing to something similar but arrives at somewhat different results.
- Sammy Martin 22 Nov 2019 21:07 UTC
  LW: 5 AF: 3
  AF Parent
  Thanks for pointing that out to me; I had not come across your work before! I’ve had a look through your post and I agree that we’re saying similar things. I would say that my ‘Value Definition Problem’ is an (intentionally) vaguer and broader question about what our research program should be—as I argued in the article, this is mostly an axiological question. Your final statement of the Alignment Problem (informally) is:
  A must learn the values of H and H must know enough about A to believe A shares H’s values
  while my Value Definition Problem is
  “Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”
  I would say the VDP is about what our ‘guiding principle’ or ‘target’ should be in order to have the best chance of solving the alignment problem. I used Christiano’s ‘intent alignment’ formulation but yours actually fits better with the VDP, I think.