cubefox comments on Evolution Solved Alignment (what sharp left turn?)

cubefox 19 Oct 2023 21:07 UTC
3 points
0

for example a dedicated jihadi who sacrifices their mortal life for reward in the afterlife is rather obviously pursing complex cultural (linguistically programmed) terminal values.

But that’s clearly an instrumental value. The expected utility of sacrificing his life may be high if he believes he will have an afterlife as a result. But if he just stops believing that, his beliefs (his epistemics) have changed, and so the expected utility calculation changes, and so the instrumental value of sacrificing his life has changed. His value for the afterlife doesn’t need to change at all.

You can imagine a jihadist who reads some epistemology and science books and as a result comes to believe that the statements in the Quaran weren’t those of God but rather of some ordinary human without any prophetic abilities. Which diminishes their credibility to those found in an arbitrary fiction book. So he may stop believing in God altogether. Even if only his beliefs have changed, it’s very unlikely he will continue to see any instrumental value in being a jihadist.

It’s like drinking a glass of clear liquid because you want to quench your thirst. Unfortunately the liquid contains poison, while you assumed it was pure water. Drinking poison here doesn’t mean you terminally value poison, it just means you had a false belief.

Regarding ethics/alignment: You talk about what people are actually motivated by. But this is arguably a hodgepodge of mostly provisional instrumental values that we would significantly modify if we changed our beliefs about the facts. As we have done in the past decades or centuries. First we may believe doing X is harmful, and so instrumentally disvalue X, then we may learn more about the consequences of doing X, such that we now believe X is beneficial or just harmless. Or the other way round. It would be no good to optimize an AI that is locked in with increasingly outdated instrumental values.

And an AI that always blindly follows our current instrumental values is also suboptimal: It may be much smarter than us, so it would avoid many epistemic mistakes we make. A child may value some amount of autonomy, but it also values being protected from bad choices by its smarter parents—choices which would harm it rather than benefit. An aligned ASI should act like such a parent: It should optimize for what we would want if we were better informed. (We may even terminally value some autonomy at the cost of making some avoidable instrumental mistakes, though only to some extent.) For this the AI has to find out what our terminal values are, and it has to use some method of aggregating them, since our values might conflict in some cases. That’s what normative ethics is about.