This post argues that by default, human preferences are strong views built upon poorly defined concepts, that may not have any coherent extrapolation in new situations. To put it another way, humans build mental maps of the world, and their preferences are defined on those maps, and so in new situations where the map no longer reflects the world accurately, it is unclear how preferences should be extended. As a result, anyone interested in preference learning should find some incoherent moral intuition that other people hold, and figure out how to make it coherent, as practice for the case we will face where our own values will be incoherent in the face of new situations.
Planned opinion:
This seems right to me—we can also see this by looking at the various paradoxes found in the philosophy of ethics, which involve taking everyday moral intuitions and finding extreme situations in which they conflict, and it is unclear which moral intuition should “win”.
Planned summary for the Alignment Newsletter:
Planned opinion:
Cool, neat summary.