So I don’t exactly disagree with your conclusion or the points you make, but I feel like there is something in this framing that sets up the possibility of creating confusion down the line. It’s a little hard to pin down where this is coming from, but I’ll try anyway by looking at this paragraph that I think may illustrate the crux:
A definition of chair is wrong if it doesn’t approximate the boundaries of the mental set of examples of chairs that we have in our minds. If your definition answers “yes” to the question “is your mobile phone a chair if you sit on it?”, then this is a wrong/bad/incorrect definition.
So there’s a certain perspective from which your framing of a-definition-of-chair-is-wrong-if-it-includes-mobile-phones makes sense because it’s not useful to the purpose of asking “can I sit on it such that I’m happier about sitting?” But this is a deeply teleological approach and as such depends thoroughly on the worldview from which chairness is being assessed. I think that’s fair as far as chairness goes and is ultimately fair about any question of ontology we might ask, since from where else can be construct ontology but from where we are perceiving the world? But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider. After all, to a biped the size of a hamster a mobile phone might make an excellent chair.
I’m interesting to see where you go with this, but having not read the subsequent posts I’ll register now my suspicion that there is some deep philosophical disagreement I’m going to have with where you end up although I can’t quite put my finger on what it is right now, but I suspect it’s something like making assumptions that are pragmatically reasonable under normal circumstances but don’t hold when considering minds in general.
But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider.
This is deliberate—a lot what I’m trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.
So maybe this points to what I’m wanting to push back against. If we focus on figuring out human values but not the category of things to which human values naturally belong and is shared with AI, we’re not setting ourselves up for solving the problem of alignment but rather having a better model of human values. Having that better model is fine as far as it goes, but so long as we keep humans as our primary frame of reference it invites us to be overly specific about what we think “values” are in ways that may inhibit our ability to understand how an AI with a very alien mind to a human one would be able to reason about them. This might help explain why I’ve preferred to go in the direction of looking for a more general concept (which I ended up calling axias) that generalizes over minds-in-general rather than looking for a concept of values that only makes sense when we look at humans, and why I think that’s a necessary approach (so we have something we can reason about in common between humans an AIs).
So I don’t exactly disagree with your conclusion or the points you make, but I feel like there is something in this framing that sets up the possibility of creating confusion down the line. It’s a little hard to pin down where this is coming from, but I’ll try anyway by looking at this paragraph that I think may illustrate the crux:
So there’s a certain perspective from which your framing of a-definition-of-chair-is-wrong-if-it-includes-mobile-phones makes sense because it’s not useful to the purpose of asking “can I sit on it such that I’m happier about sitting?” But this is a deeply teleological approach and as such depends thoroughly on the worldview from which chairness is being assessed. I think that’s fair as far as chairness goes and is ultimately fair about any question of ontology we might ask, since from where else can be construct ontology but from where we are perceiving the world? But I also sense a privileging of a particular worldview, namely a human one, that may artificially limit the sorts of useful categories we are willing to consider. After all, to a biped the size of a hamster a mobile phone might make an excellent chair.
I’m interesting to see where you go with this, but having not read the subsequent posts I’ll register now my suspicion that there is some deep philosophical disagreement I’m going to have with where you end up although I can’t quite put my finger on what it is right now, but I suspect it’s something like making assumptions that are pragmatically reasonable under normal circumstances but don’t hold when considering minds in general.
This is deliberate—a lot what I’m trying to do is figure out human values, so the human worldviews and interpretations will generally be the most relevant.
So maybe this points to what I’m wanting to push back against. If we focus on figuring out human values but not the category of things to which human values naturally belong and is shared with AI, we’re not setting ourselves up for solving the problem of alignment but rather having a better model of human values. Having that better model is fine as far as it goes, but so long as we keep humans as our primary frame of reference it invites us to be overly specific about what we think “values” are in ways that may inhibit our ability to understand how an AI with a very alien mind to a human one would be able to reason about them. This might help explain why I’ve preferred to go in the direction of looking for a more general concept (which I ended up calling axias) that generalizes over minds-in-general rather than looking for a concept of values that only makes sense when we look at humans, and why I think that’s a necessary approach (so we have something we can reason about in common between humans an AIs).