Well, if the acronym “POMDP” didn’t make any sense, I think we should start with a simpler example, like a chessboard.
Suppose we want to write a chess-playing AI that gets its input from a camera looking at the chessboard. And for some reason, we give it a button that replaces the video feed with a picture of the board in a winning position.
Inside the program, the AI knows about the rules of chess, and has some heuristics for how it expects the opponent to play. Then it represents the external chessboard with some data array. Finally, it has some rules about how the image in the camera is generated from the true chessboard and whether or not it’s pressing the button.
If we just try to get the AI to make the video feed be of a winning position, then it will press the button. But if we try to get the AI to get its internal representation of the data array to be in a winning position, and we update the internal representation to try to track the true chessboard, then it won’t press the button. This is actually quite easy to do—for example, if the AI is a jumble of neural networks, and we have a long training phase in which it’s rewarded for actually winning games, not just seeing winning board states, then it will learn to take into account the state of the button when looking at the image.
Well, if the acronym “POMDP” didn’t make any sense, I think we should start with a simpler example, like a chessboard.
Suppose we want to write a chess-playing AI that gets its input from a camera looking at the chessboard. And for some reason, we give it a button that replaces the video feed with a picture of the board in a winning position.
Inside the program, the AI knows about the rules of chess, and has some heuristics for how it expects the opponent to play. Then it represents the external chessboard with some data array. Finally, it has some rules about how the image in the camera is generated from the true chessboard and whether or not it’s pressing the button.
If we just try to get the AI to make the video feed be of a winning position, then it will press the button. But if we try to get the AI to get its internal representation of the data array to be in a winning position, and we update the internal representation to try to track the true chessboard, then it won’t press the button. This is actually quite easy to do—for example, if the AI is a jumble of neural networks, and we have a long training phase in which it’s rewarded for actually winning games, not just seeing winning board states, then it will learn to take into account the state of the button when looking at the image.