Simpler AIs may adopt a simpler version of a goal than the human programmers intentions. It’s not clear that they do so because have a motivation to do so. In a sense, a RL agent is only motivated to avoid negative reinforcement.
But simpler AIs don’t pose much of a threat. Wireheading doesn’t pose much of a threat either.
AFAICS, it’s an open question whether the goal-simplifying behaviour of simple AI’s is due to limitation or motivation.
The contentious claims are concerned with AIs that are human level, or above, sophisticated enough to appreciate human intentions directly, but nonetheless get them wrong. A RL AI that has NL, but nonetheless misunderstand “chocolate” or “happiness”, but only on the context of its goals, not in its general world knowledge, needs an architecture that allows it to do that, that allows it to engage in compartmentalisation or doublethink. Doublethink is second nature to humans, because we are optimised for primate politics.
The problem exists for reinforcement learning agents and many other designs as well. In fact RL agents are more vulnerable, because of the risk of wireheading on top of everything else. See Laurent Orseau’s work on that: https://www6.inra.fr/mia-paris/Equipes/LInK/Les-anciens-de-LInK/Laurent-Orseau/Mortal-universal-agents-wireheading
Simpler AIs may adopt a simpler version of a goal than the human programmers intentions. It’s not clear that they do so because have a motivation to do so. In a sense, a RL agent is only motivated to avoid negative reinforcement. But simpler AIs don’t pose much of a threat. Wireheading doesn’t pose much of a threat either.
AFAICS, it’s an open question whether the goal-simplifying behaviour of simple AI’s is due to limitation or motivation.
The contentious claims are concerned with AIs that are human level, or above, sophisticated enough to appreciate human intentions directly, but nonetheless get them wrong. A RL AI that has NL, but nonetheless misunderstand “chocolate” or “happiness”, but only on the context of its goals, not in its general world knowledge, needs an architecture that allows it to do that, that allows it to engage in compartmentalisation or doublethink. Doublethink is second nature to humans, because we are optimised for primate politics.