This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.