evhub comments on The Inner Alignment Problem

evhub 5 Jun 2019 22:10 UTC
4 points
I broadly agree that description complexity penalties help fight against pseudo-alignment whereas computational complexity penalties make it more likely, though I don’t think it’s absolute and there are definitely a bunch of caveats to that statement. For example, Solomonoff Induction seems unsafe despite maximally selecting for description complexity, though obviously that’s not a physical example.