Chris_Leong comments on Why do we care about agency for alignment?

Chris_Leong 23 Apr 2023 21:37 UTC
LW: 5 AF: 3
0
AF
Summary: John describes the problems of inner and outer alignment. He also describes the concept of True Names—mathematical formalisations that hold up under optimisation pressure. He suggests that having a “True Name” for optimizers would be useful if we wanted to inspect a trained system for an inner optimiser and not risk missing something.
He further suggests that the concept of agency breaks down into lower-level components like “optimisation”, “goals”, “world models”, ect. It would be possible to make further arguments about how these lower-level concepts are important for AI safety.