My take on myopia is that it’s “shortsightedness” in the sense of only trying to do “local work”. If I ask you what two times two is, you say “four” because it’s locally true, rather than because you anticipate the consequences of different numbers, and say “five” because that will lead to a consequence you prefer. [Or you’re running a heuristic that approximates that anticipation.]
If you knew that everyone in a bureaucracy were just “doing their jobs”, this gives you a sort of transparency guarantee, where you just need to follow the official flow of information to see what’s happening. Unless asked to design a shadow bureaucracy or take over, no one will do that.
However, training doesn’t give you this by default; people in the bureaucracy are incentivized to make their individual departments better, to say what the boss wants to hear, to share gossip at the water cooler, and so on. One of the scenarios people consider is the case where you’re training an AI to solve some problem, and at some point it realizes it’s being trained to solve that problem and so starts performing as well as it can on that metric. In animal reinforcement training, people often talk about how both you’re training the animal to perform tricks for rewards, and the animal is training you to reward it! The situation is subtly different here, but the basic figure-ground inversion holds.
My take on myopia is that it’s “shortsightedness” in the sense of only trying to do “local work”. If I ask you what two times two is, you say “four” because it’s locally true, rather than because you anticipate the consequences of different numbers, and say “five” because that will lead to a consequence you prefer. [Or you’re running a heuristic that approximates that anticipation.]
If you knew that everyone in a bureaucracy were just “doing their jobs”, this gives you a sort of transparency guarantee, where you just need to follow the official flow of information to see what’s happening. Unless asked to design a shadow bureaucracy or take over, no one will do that.
However, training doesn’t give you this by default; people in the bureaucracy are incentivized to make their individual departments better, to say what the boss wants to hear, to share gossip at the water cooler, and so on. One of the scenarios people consider is the case where you’re training an AI to solve some problem, and at some point it realizes it’s being trained to solve that problem and so starts performing as well as it can on that metric. In animal reinforcement training, people often talk about how both you’re training the animal to perform tricks for rewards, and the animal is training you to reward it! The situation is subtly different here, but the basic figure-ground inversion holds.