Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.
Selection pressure will cause models to become agentic as they increase in power—those doing the agentic things (following universal instrumental goals like accumulating more resources and self-improvement) will outperform those that don’t. Mesaoptimisation (explainer video) is kind of like cheating—models that create inner optimisers that target something easier to get than what we meant, will be selected (by getting higher rewards) over models that don’t (because we won’t be aware of the inner misalignment). Evolution is a case in point—we are products of it, yet misaligned to its goals (we want sex, and high calorie foods, and money, rather than caring explicitly about inclusive genetic fitness). Without alignment being 100% watertight, powerful AIs will have completely alien goals.