This doesn’t seem to follow logically either. Thousands of years of intellectual effort amounts to a tiny fraction of the computation that the universe could do. Why is that a reason to think it very hard to outperform humans?
I meant more that there has been a lot of effort into figuring out how to normatively talk about what “good” is, and it seems necessary for us to figure that out if you want to write down code that we can know beforehand will correctly extrapolate our values even though it cannot rely on us avoiding corrupted values. Now admittedly in the past people weren’t specifically thinking about the problem of how to use a lot of computation to do this, but many of the thought experiments they propose seem to be of a similar nature (eg. suppose that we were logically omniscient and knew the consequences of our beliefs).
It’s possible that this is mostly me misunderstanding terminology. Previously it sounded like you were against letting humans discover their own values through experience and instead having them figure it out through deliberation and reflection from afar. Now it sounds like you actually do want humans to discover their own values through experience, but you want them to be able to control the experiences and take them at their own pace.
But the idealized humans are free to broaden their “distribution” (e.g., experience waterboarding), when they figure out how to do that in a safe way. The difference is that unlike the real humans, they won’t be forced to deal with strange new inputs before they are ready.
Thanks, that clarifies things a lot. I take back the statement about trusting real humans more.
And these AIs/supervisors also act as a cabal to stop anyone else from running an AI without going through the same training, right ?
Probably? I haven’t thought about it much.
don’t you think putting humans in such positions of power in itself carries a high risk of corruption, and that it will be hard to come up with training that can reliably prevent such corruption?
I agree this seems potentially problematic, I’m not sure that it’s a deal-breaker.
I meant more that there has been a lot of effort into figuring out how to normatively talk about what “good” is, and it seems necessary for us to figure that out if you want to write down code that we can know beforehand will correctly extrapolate our values even though it cannot rely on us avoiding corrupted values.
I don’t think it’s necessary for us to figure out what “good” is, instead my “white-box metaphilosophical approach” is to figure out what “doing philosophy” is, program/teach an AI to “do philosophy” which would let it figure out what “good” is on its own. I think there has not been a lot of effort invested into that problem so there’s a somewhat reasonable chance that it might be tractable. It seems worth investigating especially given that it might be the only way to fully solve the human safety problem.
It’s possible that this is mostly me misunderstanding terminology. Previously it sounded like you were against letting humans discover their own values through experience and instead having them figure it out through deliberation and reflection from afar. Now it sounds like you actually do want humans to discover their own values through experience, but you want them to be able to control the experiences and take them at their own pace.
To clarify, this is a separate approach from the white-box metaphilosophical approach. And I’m not sure whether I want the humans to discover their values through experience or through deliberation. I guess they should first deliberate on that choice and then do whatever they decide is best.
I meant more that there has been a lot of effort into figuring out how to normatively talk about what “good” is, and it seems necessary for us to figure that out if you want to write down code that we can know beforehand will correctly extrapolate our values even though it cannot rely on us avoiding corrupted values. Now admittedly in the past people weren’t specifically thinking about the problem of how to use a lot of computation to do this, but many of the thought experiments they propose seem to be of a similar nature (eg. suppose that we were logically omniscient and knew the consequences of our beliefs).
It’s possible that this is mostly me misunderstanding terminology. Previously it sounded like you were against letting humans discover their own values through experience and instead having them figure it out through deliberation and reflection from afar. Now it sounds like you actually do want humans to discover their own values through experience, but you want them to be able to control the experiences and take them at their own pace.
Thanks, that clarifies things a lot. I take back the statement about trusting real humans more.
Probably? I haven’t thought about it much.
I agree this seems potentially problematic, I’m not sure that it’s a deal-breaker.
I don’t think it’s necessary for us to figure out what “good” is, instead my “white-box metaphilosophical approach” is to figure out what “doing philosophy” is, program/teach an AI to “do philosophy” which would let it figure out what “good” is on its own. I think there has not been a lot of effort invested into that problem so there’s a somewhat reasonable chance that it might be tractable. It seems worth investigating especially given that it might be the only way to fully solve the human safety problem.
To clarify, this is a separate approach from the white-box metaphilosophical approach. And I’m not sure whether I want the humans to discover their values through experience or through deliberation. I guess they should first deliberate on that choice and then do whatever they decide is best.