This was definitely an interesting and persuasive presentation of the idea. I think this goes to the same place as learning from behavior in the end, though.
For behavior: In the ancestral environment, we behaved like we wanted nourishing food and reproduction. In the modern environment we behave like we want tasty food and sex. Given a button that pumps heroin into our brain, we might behave like we want heroin pumped into our brains.
For valence, the set of preferences that optimizing valence cashes out to depends on the environment. We, in the modern environment, don’t want to be drugged to maximize some neural signal. But if we were raised on super-heroin, we’d probably just want super-heroin. Even assuming this single-neurological-signal hypothesis, we aren’t valence-optimizers, we are the learned behavior of a system whose training procedure relies on the valence signal.
Ex hypothesi, we’re going to have learned preferences that won’t optimize valence, but might still be understandable in terms of a preference maturation process that is “trying” to optimize valence but ran into distributional shift or adversarial optimization or something. These preferences (like refusing the heroin) are still fully valid human preferences, and you’re going to need to look at human behavior to figure out what they are (barring big godlike a priori reasoning), which entails basically similar philosophical problems as getting all values from behavior without this framework.
These preferences (like refusing the heroin) are still fully valid human preferences, and you’re going to need to look at human behavior to figure out what they are (barring big godlike a priori reasoning), which entails basically similar philosophical problems as getting all values from behavior without this framework.
I’m hopeful that this won’t be true in a certain, limited way. That is, in a certain sense, scanning brains and observing how neurons operate to determine the behavior of a human is a very different sort of operation from observing their behavior “from the outside” the way we observe people’s behavior today. Much of the difficulty is that because observing behavior we can see only with our unaided senses and without a deep model of the brain forces us to make very large normative assumptions to get the necessary power to infer things about how a human values things, but if we have a model like this and it appears to be correct then we can, practically speaking, make “smaller”, less powerful normative assumptions because we understand and can work out the details of more of the gears of the mind.
The result is that in a certain sense we are still concerned with behavior, but because the level of detail is so much higher and the model so much richer we are less likely to find ourselves making mistakes from having taken large inferential leaps as we would if we observed behavior in the normal sense.
This was definitely an interesting and persuasive presentation of the idea. I think this goes to the same place as learning from behavior in the end, though.
For behavior: In the ancestral environment, we behaved like we wanted nourishing food and reproduction. In the modern environment we behave like we want tasty food and sex. Given a button that pumps heroin into our brain, we might behave like we want heroin pumped into our brains.
For valence, the set of preferences that optimizing valence cashes out to depends on the environment. We, in the modern environment, don’t want to be drugged to maximize some neural signal. But if we were raised on super-heroin, we’d probably just want super-heroin. Even assuming this single-neurological-signal hypothesis, we aren’t valence-optimizers, we are the learned behavior of a system whose training procedure relies on the valence signal.
Ex hypothesi, we’re going to have learned preferences that won’t optimize valence, but might still be understandable in terms of a preference maturation process that is “trying” to optimize valence but ran into distributional shift or adversarial optimization or something. These preferences (like refusing the heroin) are still fully valid human preferences, and you’re going to need to look at human behavior to figure out what they are (barring big godlike a priori reasoning), which entails basically similar philosophical problems as getting all values from behavior without this framework.
I’m hopeful that this won’t be true in a certain, limited way. That is, in a certain sense, scanning brains and observing how neurons operate to determine the behavior of a human is a very different sort of operation from observing their behavior “from the outside” the way we observe people’s behavior today. Much of the difficulty is that because observing behavior we can see only with our unaided senses and without a deep model of the brain forces us to make very large normative assumptions to get the necessary power to infer things about how a human values things, but if we have a model like this and it appears to be correct then we can, practically speaking, make “smaller”, less powerful normative assumptions because we understand and can work out the details of more of the gears of the mind.
The result is that in a certain sense we are still concerned with behavior, but because the level of detail is so much higher and the model so much richer we are less likely to find ourselves making mistakes from having taken large inferential leaps as we would if we observed behavior in the normal sense.