Charlie Steiner comments on What is it to solve the alignment problem?

Charlie Steiner 27 Aug 2024 16:26 UTC
9 points
2
I agree and yet I think it’s not actually that hard to make progress.
There is no canonical way to pick out human values,^[1] and yet using an AI to make clever long-term plans implicitly makes some choice. You can’t dodge choosing how to interpret humans, if you think you’re dodging it you’re just doing it in an unexamined way.
Yes, humans are bad at philosophy and are capable of making things worse rather than better by examining them. I don’t have much to say other than get good. Just kludging together how the AI interprets humans seems likely to lead to problems to me, especially in a possible multipolar future where there’s more incentive for people to start using AI to make clever plans to steer the world.
This absolutely means disposing of appealing notions like a unique CEV, or even an objectively best choice of AI to build, even as we make progress on developing standards for good AI to build.
1. ^
  See the Reducing Goodhart sequence for me on this, which starts sketching some ways to deal with humans not being agents.
- Seth Herd 29 Aug 2024 1:09 UTC
  2 points
  2
  Parent
  I agree and I think this is critical. The standard of getting >90% of the possible value from our lightcone, or similar, seems ridiculously high given the seemingly very real possibility of achieving zero or negative value.
  
  And it seems certain that there’s no absolute standard for achieving human values. What they are is path dependent.
  
  But we can still achieve an unimaginably good future by achieving ASI that does anything that humans roughly want.