Wei Dai comments on The Main Sources of AI Risk?

Wei Dai 29 Mar 2019 7:23 UTC
LW: 8 AF: 4
AF

Failure to learn how to deal with alignment in the many-humans, many-AIs case even if single-human, single-AI alignment is solved (which I think Andrew Critch has talked about).

Good point, I’ll add this to the list.

For example, AIs negotiating on behalf of humans take the stance described in https://arxiv.org/abs/1711.00363 of agreeing to split control of the future according to which human’s priors are most accurate (on potentially irrelevant issues) if this isn’t what humans actually want.

Thanks, I hadn’t noticed that paper until now. Under “Related Works” it cites Social Choice Theory but doesn’t actually mention any recent research from that field. Here is one paper that criticizes the Pareto principle that Critch’s paper is based on, in the context of preference aggregation of people with different priors: Spurious Unanimity and the Pareto Principle