Wei Dai comments on What is it to solve the alignment problem?

Wei Dai 25 Aug 2024 11:40 UTC
37 points
16
I have a lot of disagreements with section 6. Not sure where the main crux is, so I’ll just write down a couple of things.

One intuition pump here is: in the current, everyday world, basically no one goes around with much of a sense of what people’s “values on reflection” are, or where they lead.

This only works because we’re not currently often in danger of subjecting other people to major distributional shifts. See Two Neglected Problems in Human-AI Safety.

That is, ultimately, there is just the empirical pattern of: what you would think/feel/value given a zillion different hypothetical processes; what you would think/feel/value about those processes given a zillion different other hypothetical processes; and so on. And you need to choose, now, in your actual concrete circumstance, which of those hypotheticals to give authority to.

I notice that in order to argue that solving AI alignment does not need “very sophisticated philosophical achievement”, you’ve proposed a solution to metaethics, which would itself constitute a “very sophisticated philosophical achievement” if it’s correct!

Personally I’m very uncertain about metaethics (see also previous discussion on this topic between Joe and me), and don’t want to see humanity bet the universe on any particular metaethical theory in our current epistemic state.