Rob Bensinger comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

Rob Bensinger 19 Jun 2022 5:48 UTC
LW: 21 AF: 9
0
AF
Ronny Fernandez on Twitter:
I think I don’t like AI safety analogies with human evolution except as illustrations. I don’t think they’re what convinced the people who use those analogies, and they’re not what convinced me. You can convince yourself of the same things just by knowing some stuff about agency.
Corrigibility, human values, and figure-out-while-aiming-for-human-values, are not short description length. I know because I’ve practiced finding the shortest description lengths of things a lot, and they just don’t seem like the right sort of thing.
Also, if you get to the level where you can realize when you’ve failed, and you try it over and over again, you will find that it is very hard to find a short description of any of these nice things we want.
And so this tells us that a general intelligence we are happy we built is a small target within the wide basin of general intelligence
Ideal agency is short description length. I don’t think particular tractable agency is short description length, and ml cares about run time, but there are heuristic approximations to ideal agency, and there are many different ones because ideal agency is short description length
So this tells us that there is a wide basin of attraction for general intelligence.