Wei Dai comments on Three AI Safety Related Ideas

Wei Dai 21 Dec 2018 9:44 UTC
LW: 4 AF: 2
AF

I meant more that there has been a lot of effort into figuring out how to normatively talk about what “good” is, and it seems necessary for us to figure that out if you want to write down code that we can know beforehand will correctly extrapolate our values even though it cannot rely on us avoiding corrupted values.

I don’t think it’s necessary for us to figure out what “good” is, instead my “white-box metaphilosophical approach” is to figure out what “doing philosophy” is, program/teach an AI to “do philosophy” which would let it figure out what “good” is on its own. I think there has not been a lot of effort invested into that problem so there’s a somewhat reasonable chance that it might be tractable. It seems worth investigating especially given that it might be the only way to fully solve the human safety problem.

It’s possible that this is mostly me misunderstanding terminology. Previously it sounded like you were against letting humans discover their own values through experience and instead having them figure it out through deliberation and reflection from afar. Now it sounds like you actually do want humans to discover their own values through experience, but you want them to be able to control the experiences and take them at their own pace.

To clarify, this is a separate approach from the white-box metaphilosophical approach. And I’m not sure whether I want the humans to discover their values through experience or through deliberation. I guess they should first deliberate on that choice and then do whatever they decide is best.