A very recent post that might add some concreteness to my own views: Human wanting
I think many of the bullets in that post describe current AI systems poorly or not at all. So current AI systems are either doing something entirely different from human wanting, or imitating human wanting rather poorly.
I lean towards the former, but I think some of the critical points about prosaic alignment apply in either case.
You might object that “having preferences” or “caring at all” are a lot simpler than the concept of human wanting that Tsvi is gesturing at in that post, and that current AI systems are actually doing these simpler things pretty well. If so, I’d ask what exactly those simpler concepts are, and why you expect prosiac alignment techniques to hold up once AI systems are capable of more complicated kinds of wanting.
A very recent post that might add some concreteness to my own views: Human wanting
I think many of the bullets in that post describe current AI systems poorly or not at all. So current AI systems are either doing something entirely different from human wanting, or imitating human wanting rather poorly.
I lean towards the former, but I think some of the critical points about prosaic alignment apply in either case.
You might object that “having preferences” or “caring at all” are a lot simpler than the concept of human wanting that Tsvi is gesturing at in that post, and that current AI systems are actually doing these simpler things pretty well. If so, I’d ask what exactly those simpler concepts are, and why you expect prosiac alignment techniques to hold up once AI systems are capable of more complicated kinds of wanting.