I think it’s an example of an AI that completely lacks the notion of in-world goals. Its goal is restricted to a purely symbolic system; that system happens to map to parts of the world, but the AI lacks the self-reflection to realise which symbols map to itself and its immediate environment, and how manipulating those symbols may make it better at accomplishing its goals. Severing that feedback loop is IMO the key to avoiding instrumental convergence. Without that, all you get is another variety of a chess playing AI: superhumanly smart at its own task, but the world within which that task optimizes goals is too abstract for its skill to be “portable” to a dangerous domain.
I think it’s an example of an AI that completely lacks the notion of in-world goals. Its goal is restricted to a purely symbolic system; that system happens to map to parts of the world, but the AI lacks the self-reflection to realise which symbols map to itself and its immediate environment, and how manipulating those symbols may make it better at accomplishing its goals. Severing that feedback loop is IMO the key to avoiding instrumental convergence. Without that, all you get is another variety of a chess playing AI: superhumanly smart at its own task, but the world within which that task optimizes goals is too abstract for its skill to be “portable” to a dangerous domain.