luosha@gmail.com comments on AGI Ruin: A List of Lethalities

luosha@gmail.com 16 Jun 2022 18:51 UTC
1 point
0
On instrumental convergence: humans would seem to be a prominent counterexample to “most agents don’t let you edit their utility functions”—at least in the sense that our goals/interests etc are quite sensitive to those of people around us. So maybe not explicit editing, but lots of being influenced by and converging to the goals and interests of those around us. (and maybe this suggests another tool for alignment, which is building in this same kind of sensitivity to artificial agents’ utility functions)