First, I don’t think cultural evolution influences our terminal values.
It seems straightforwardly obvious that it does for reasonable definitions of terminal values: for example a dedicated jihadi who sacrifices their mortal life for reward in the afterlife is rather obviously pursing complex cultural (linguistically programmed) terminal values.
But apart from your similarity to Hanson: I think the goal of AI alignment really is ethics. Not some specific human values shaped by millennia of biological or even cultural evolution.
I don’t believe in ethics in that sense. Everything we do as humans is inevitably unavoidably determined by our true values shaped by millennia of cultural evolution on top of eons of biological evolution. Ethics is simply a system of low complexity simplified inter agent negotiation protocols/standards—not our true values at all. Our true individual values can only be understood through deep neuroscience/DL ala shard theory.
Individual humans working on AGI are motivated by their individual values, but the aggregate behavior of the entire tech economy is best understood on its own terms at the system level as an inhuman optimization process optimizing for some utility function ultimately related to individual human values and their interactions, but not in obvious ways, and not really related to ethics.
for example a dedicated jihadi who sacrifices their mortal life for reward in the afterlife is rather obviously pursing complex cultural (linguistically programmed) terminal values.
But that’s clearly an instrumental value. The expected utility of sacrificing his life may be high if he believes he will have an afterlife as a result. But if he just stops believing that, his beliefs (his epistemics) have changed, and so the expected utility calculation changes, and so the instrumental value of sacrificing his life has changed. His value for the afterlife doesn’t need to change at all.
You can imagine a jihadist who reads some epistemology and science books and as a result comes to believe that the statements in the Quaran weren’t those of God but rather of some ordinary human without any prophetic abilities. Which diminishes their credibility to those found in an arbitrary fiction book. So he may stop believing in God altogether. Even if only his beliefs have changed, it’s very unlikely he will continue to see any instrumental value in being a jihadist.
It’s like drinking a glass of clear liquid because you want to quench your thirst. Unfortunately the liquid contains poison, while you assumed it was pure water. Drinking poison here doesn’t mean you terminally value poison, it just means you had a false belief.
Regarding ethics/alignment: You talk about what people are actually motivated by. But this is arguably a hodgepodge of mostly provisional instrumental values that we would significantly modify if we changed our beliefs about the facts. As we have done in the past decades or centuries. First we may believe doing X is harmful, and so instrumentally disvalue X, then we may learn more about the consequences of doing X, such that we now believe X is beneficial or just harmless. Or the other way round. It would be no good to optimize an AI that is locked in with increasingly outdated instrumental values.
And an AI that always blindly follows our current instrumental values is also suboptimal: It may be much smarter than us, so it would avoid many epistemic mistakes we make. A child may value some amount of autonomy, but it also values being protected from bad choices by its smarter parents—choices which would harm it rather than benefit. An aligned ASI should act like such a parent: It should optimize for what we would want if we were better informed. (We may even terminally value some autonomy at the cost of making some avoidable instrumental mistakes, though only to some extent.) For this the AI has to find out what our terminal values are, and it has to use some method of aggregating them, since our values might conflict in some cases. That’s what normative ethics is about.
It seems straightforwardly obvious that it does for reasonable definitions of terminal values: for example a dedicated jihadi who sacrifices their mortal life for reward in the afterlife is rather obviously pursing complex cultural (linguistically programmed) terminal values.
I don’t believe in ethics in that sense. Everything we do as humans is inevitably unavoidably determined by our true values shaped by millennia of cultural evolution on top of eons of biological evolution. Ethics is simply a system of low complexity simplified inter agent negotiation protocols/standards—not our true values at all. Our true individual values can only be understood through deep neuroscience/DL ala shard theory.
Individual humans working on AGI are motivated by their individual values, but the aggregate behavior of the entire tech economy is best understood on its own terms at the system level as an inhuman optimization process optimizing for some utility function ultimately related to individual human values and their interactions, but not in obvious ways, and not really related to ethics.
But that’s clearly an instrumental value. The expected utility of sacrificing his life may be high if he believes he will have an afterlife as a result. But if he just stops believing that, his beliefs (his epistemics) have changed, and so the expected utility calculation changes, and so the instrumental value of sacrificing his life has changed. His value for the afterlife doesn’t need to change at all.
You can imagine a jihadist who reads some epistemology and science books and as a result comes to believe that the statements in the Quaran weren’t those of God but rather of some ordinary human without any prophetic abilities. Which diminishes their credibility to those found in an arbitrary fiction book. So he may stop believing in God altogether. Even if only his beliefs have changed, it’s very unlikely he will continue to see any instrumental value in being a jihadist.
It’s like drinking a glass of clear liquid because you want to quench your thirst. Unfortunately the liquid contains poison, while you assumed it was pure water. Drinking poison here doesn’t mean you terminally value poison, it just means you had a false belief.
Regarding ethics/alignment: You talk about what people are actually motivated by. But this is arguably a hodgepodge of mostly provisional instrumental values that we would significantly modify if we changed our beliefs about the facts. As we have done in the past decades or centuries. First we may believe doing X is harmful, and so instrumentally disvalue X, then we may learn more about the consequences of doing X, such that we now believe X is beneficial or just harmless. Or the other way round. It would be no good to optimize an AI that is locked in with increasingly outdated instrumental values.
And an AI that always blindly follows our current instrumental values is also suboptimal: It may be much smarter than us, so it would avoid many epistemic mistakes we make. A child may value some amount of autonomy, but it also values being protected from bad choices by its smarter parents—choices which would harm it rather than benefit. An aligned ASI should act like such a parent: It should optimize for what we would want if we were better informed. (We may even terminally value some autonomy at the cost of making some avoidable instrumental mistakes, though only to some extent.) For this the AI has to find out what our terminal values are, and it has to use some method of aggregating them, since our values might conflict in some cases. That’s what normative ethics is about.