Even when talking about how humans shouldn’t always be thought of as having some “true goal” that we just need to communicate, it’s so difficult to avoid talking in that way :) We naturally phrase alignment as alignment to something—and if it’s not humans, well, it must be “alignment with something bigger than humans.” We don’t have the words to be more specific than “good” or “good for humans,” without jumping straight back to aligning outcomes to something specific like “the goals endorsed by humans under reflective equilibrium” or whatever.
We need a good linguistic-science fiction story about a language with no such issues.
Yes, I agree, it’s difficult to find explicit and specific language for what it is that we would really like to align AI systems with. Thank you for the reply. I would love to read such a story!
Even when talking about how humans shouldn’t always be thought of as having some “true goal” that we just need to communicate, it’s so difficult to avoid talking in that way :) We naturally phrase alignment as alignment to something—and if it’s not humans, well, it must be “alignment with something bigger than humans.” We don’t have the words to be more specific than “good” or “good for humans,” without jumping straight back to aligning outcomes to something specific like “the goals endorsed by humans under reflective equilibrium” or whatever.
We need a good linguistic-science fiction story about a language with no such issues.
Yes, I agree, it’s difficult to find explicit and specific language for what it is that we would really like to align AI systems with. Thank you for the reply. I would love to read such a story!