Rohin Shah comments on “Go west, young man!”—Preferences in (imperfect) maps

Rohin Shah 4 Aug 2020 0:24 UTC
LW: 4 AF: 3
AF
Planned summary for the Alignment Newsletter:
This post argues that by default, human preferences are strong views built upon poorly defined concepts, that may not have any coherent extrapolation in new situations. To put it another way, humans build mental maps of the world, and their preferences are defined on those maps, and so in new situations where the map no longer reflects the world accurately, it is unclear how preferences should be extended. As a result, anyone interested in preference learning should find some incoherent moral intuition that other people hold, and figure out how to make it coherent, as practice for the case we will face where our own values will be incoherent in the face of new situations.
Planned opinion:
This seems right to me—we can also see this by looking at the various paradoxes found in the philosophy of ethics, which involve taking everyday moral intuitions and finding extreme situations in which they conflict, and it is unclear which moral intuition should “win”.
- Stuart_Armstrong 4 Aug 2020 9:55 UTC
  LW: 4 AF: 3
  AF Parent
  Cool, neat summary.