But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider.
This is deliberate—a lot what I’m trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.
So maybe this points to what I’m wanting to push back against. If we focus on figuring out human values but not the category of things to which human values naturally belong and is shared with AI, we’re not setting ourselves up for solving the problem of alignment but rather having a better model of human values. Having that better model is fine as far as it goes, but so long as we keep humans as our primary frame of reference it invites us to be overly specific about what we think “values” are in ways that may inhibit our ability to understand how an AI with a very alien mind to a human one would be able to reason about them. This might help explain why I’ve preferred to go in the direction of looking for a more general concept (which I ended up calling axias) that generalizes over minds-in-general rather than looking for a concept of values that only makes sense when we look at humans, and why I think that’s a necessary approach (so we have something we can reason about in common between humans an AIs).
This is deliberate—a lot what I’m trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.
So maybe this points to what I’m wanting to push back against. If we focus on figuring out human values but not the category of things to which human values naturally belong and is shared with AI, we’re not setting ourselves up for solving the problem of alignment but rather having a better model of human values. Having that better model is fine as far as it goes, but so long as we keep humans as our primary frame of reference it invites us to be overly specific about what we think “values” are in ways that may inhibit our ability to understand how an AI with a very alien mind to a human one would be able to reason about them. This might help explain why I’ve preferred to go in the direction of looking for a more general concept (which I ended up calling axias) that generalizes over minds-in-general rather than looking for a concept of values that only makes sense when we look at humans, and why I think that’s a necessary approach (so we have something we can reason about in common between humans an AIs).