I agree with your concerns. I’m glad Stuart is doing this work as it moves us much further along than we have been, but it also falls short in a number of ways.
I’m trying to find time to write up all of my own current thinking on this, but the short version is that I suspect modeling human preferences in the way Stuart does is too leaky an abstraction to work, humans don’t have “preferences” per se they instead have valences over mental actions and it’s how those valences interact with the actions we take, observable to outsiders or not, that produce events we can reason about using the preference model (that is, treating those phenomena as if they were preferences).
It would help if we had better neuroscience than we do today, but i guess we’ll have to make due with what we’ve got for the time being, which means unfortunately our models can’t (yet) be totally grounded in what’s happening physically.
I’d definitely be interested in your thoughts about preferences when you get them into a shareable shape.
In some sense, what humans “really” have is just atoms moving around, all talk of mental states and so on is some level of convenient approximation. So when you say you want to talk about a different sort of approximation from Stuart, my immediate thing I’m curious about is “how can you make your way of talking about humans convenient for getting an AI to behave well?”
You can get some clues on my thoughts I think. I used to take an approach much like Stuart, but I now that that’s the wrong abstraction. The thing I’ve recently written that most points towards my thinking is “Let Values Drift”, which I wrote mostly because it was the first topic that really started to catalyze my thinking about human values.
I agree with your concerns. I’m glad Stuart is doing this work as it moves us much further along than we have been, but it also falls short in a number of ways.
I’m trying to find time to write up all of my own current thinking on this, but the short version is that I suspect modeling human preferences in the way Stuart does is too leaky an abstraction to work, humans don’t have “preferences” per se they instead have valences over mental actions and it’s how those valences interact with the actions we take, observable to outsiders or not, that produce events we can reason about using the preference model (that is, treating those phenomena as if they were preferences).
It would help if we had better neuroscience than we do today, but i guess we’ll have to make due with what we’ve got for the time being, which means unfortunately our models can’t (yet) be totally grounded in what’s happening physically.
I’d definitely be interested in your thoughts about preferences when you get them into a shareable shape.
In some sense, what humans “really” have is just atoms moving around, all talk of mental states and so on is some level of convenient approximation. So when you say you want to talk about a different sort of approximation from Stuart, my immediate thing I’m curious about is “how can you make your way of talking about humans convenient for getting an AI to behave well?”
You can get some clues on my thoughts I think. I used to take an approach much like Stuart, but I now that that’s the wrong abstraction. The thing I’ve recently written that most points towards my thinking is “Let Values Drift”, which I wrote mostly because it was the first topic that really started to catalyze my thinking about human values.