Edouard Harris comments on The alignment problem in different capability regimes

Edouard Harris 15 Sep 2021 23:44 UTC
LW: 2 AF: 2
AF
One reason to favor such a definition of alignment might be that we ultimately need a definition that gives us guarantees that hold at human-level capability or greater, and humans are probably near the bottom of the absolute scale of capabilities that can be physically realized in our world. It would (imo) be surprising to discover a useful alignment definition that held across capability levels way beyond us, but that didn’t hold below our own modest level of intelligence.