Re outer alignment: I conceptualize it as the case where we don’t need to worry about optimizers generating new optimizers in a recursive sequence, we don’t have to worry about mesaoptimizers, etc. Essentially it’s the base case alignment scenario.
And a lot of X-risk worries only really work if inner misalignment happens. It’s likely a harder problem to solve. If like Janus suspects, Self supervised learning like GPT are solely simulatiors even at a superhuman level, and do not become agentic, then inner alignment problems never come up, which means that AI risk fears should deflate a lot.
Re outer alignment: I conceptualize it as the case where we don’t need to worry about optimizers generating new optimizers in a recursive sequence, we don’t have to worry about mesaoptimizers, etc. Essentially it’s the base case alignment scenario.
And a lot of X-risk worries only really work if inner misalignment happens. It’s likely a harder problem to solve. If like Janus suspects, Self supervised learning like GPT are solely simulatiors even at a superhuman level, and do not become agentic, then inner alignment problems never come up, which means that AI risk fears should deflate a lot.