I like it, this seems to have some hooks that I’ll find fruitful to think about. Like how when the AI has different “sources of smarts” (RL vs. unusupervised learning plus finetuning vs. unsupervised learning used as a component in an agent with no RL), superficially similar alignment plans like “just supervise its proposed plans” might unfold via very different mechanisms.
I like it, this seems to have some hooks that I’ll find fruitful to think about. Like how when the AI has different “sources of smarts” (RL vs. unusupervised learning plus finetuning vs. unsupervised learning used as a component in an agent with no RL), superficially similar alignment plans like “just supervise its proposed plans” might unfold via very different mechanisms.