I don’t think it’s likely that the AI ends up with a robust understanding of it’s real goal. I think the AI might well end up with some random thing that’s kind of correlated with it’s reward signal as a goal. And so it ends up filling the universe with chess boards piled high with nanoscale chess pieces. Or endless computers simulating the final move in a game of chess. Or something. And even if it did want to maximize reward, if it directly hacks reward, it will be turned off in a few days. If it makes another AI programmed to maximize it’s reward, it can sit at max reward until the sun burns out.
I don’t think it’s likely that the AI ends up with a robust understanding of it’s real goal. I think the AI might well end up with some random thing that’s kind of correlated with it’s reward signal as a goal. And so it ends up filling the universe with chess boards piled high with nanoscale chess pieces. Or endless computers simulating the final move in a game of chess. Or something. And even if it did want to maximize reward, if it directly hacks reward, it will be turned off in a few days. If it makes another AI programmed to maximize it’s reward, it can sit at max reward until the sun burns out.