Matthew Barnett comments on Inner alignment requires making assumptions about human values

Matthew Barnett 23 Jan 2020 21:47 UTC
LW: 2 AF: 1
AF
Is your point mostly centered around there being no single correct way to generalize to new domains, but humans have preferences about how the AI should generalize, so to generalize properly, the AI needs to learn how humans want it to do generalization?
Pretty much, yeah.
The above sentence makes lots of sense to me, but I don’t see how it’s related to inner alignment
I think there are a lot of examples of this phenomenon in AI alignment, but I focused on inner alignment for two reasons
- There’s a heuristic that a solution to inner alignment should be independent of human values, and this argument rebuts that heuristic.
- The problem of inner alignment is pretty much the problem of how to get a system to properly generalize, which makes “proper generalization” fundamentally linked to the idea.