Wei Dai comments on Deliberation as a method to find the “actual preferences” of humans

Wei Dai 30 Oct 2019 1:54 UTC
LW: 7 AF: 4
AF

I think my main confusion is that Paul talks about many different ways deliberation could work (e.g. RL-based IDA and human-in-the-counterfactual-loop seem pretty different), and it’s not clear what approach he thinks is most plausible.

I have similar questions, and I’m not sure how much of it is that Paul is uncertain himself, and how much is Paul not having communicated his thinking yet. Also one thing to keep in mind is that different forms of deliberation could be used at different levels of the system, so for example one method can be used to model/emulate/extrapolate the overseer’s deliberation and another one for the end-user.

On a more general note, I’m really worried that we don’t have much understanding of how or why human deliberation can lead to good outcomes in the long run. It seems clear that an individual human deliberating in isolation is highly likely to get stuck or go off the rails, and groups of humans often do so as well. To the extent that we as a global civilization seemingly are able to make progress in the very long run, it seems at best a fragile process, which we don’t know how to reliably preserve, or reproduce in an artificial setting.
What links here?
- The Value Definition Problem by Sammy Martin (18 Nov 2019 19:56 UTC; 15 points)