Noosphere89 comments on Categorizing failures as “outer” or “inner” misalignment is often confused

Noosphere89 6 Jan 2023 16:49 UTC
1 point
0
Re outer alignment: I conceptualize it as the case where we don’t need to worry about optimizers generating new optimizers in a recursive sequence, we don’t have to worry about mesaoptimizers, etc. Essentially it’s the base case alignment scenario.

And a lot of X-risk worries only really work if inner misalignment happens. It’s likely a harder problem to solve. If like Janus suspects, Self supervised learning like GPT are solely simulatiors even at a superhuman level, and do not become agentic, then inner alignment problems never come up, which means that AI risk fears should deflate a lot.