Sorry, fixed broken link now.
The problem with “understanding the concept of intent”—is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent—and correlates like “well-being” mean—for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.
Interesting read, would be great to see more done in this direction. However,it seems that mind-body dualism is still the prevalent (dare I say “dominant”) mode of understanding human will and consciousness in CS and AI-safety. In my opinion—the best picture we have of human value creation comes from social and psychological sciences—not metaphysics and mathematics—and it would be great to have more interactions with those fields.
For what it’s worth I’ve written a bunch on agency-loss as an attractor in AI/AGI-human interactions.
https://www.lesswrong.com/posts/dDDi9bZm6ELSXTJd9/intent-aligned-ai-systems-deplete-human-agency-the-need-for
And a shorter paper/poster on this at ICML last week: https://icml.cc/virtual/2024/poster/32943