Whether or not an AI would want to wirehead would depend entirely on it’s ontology. Maximizing paperclips, maximizing the reward from paperclips, and maximizing the integer that tracks paperclips are 3 very different concepts, and depending on how the AI sees itself all 3 are plausible goals the AI could have, depending on it’s ontology. There’s no reason to suspect that one of those ontologies is more likely that I can see.
Even if the idea does have an ontology that maximizes the integer tracking paperclips, one then has to ask how time is factored into the equation. Is it better to be in the state of maximum reward for a longer period of time? Then the AI will want to ensure everything that could prevent it being in that is gone.
Finally, one has to consider how the integer itself works. Is it unbounded? If it is, then to maximize the reward the AI must use all matter and energy possible to store the largest possible version of that integer in memory.
Your last paragraph is really interesting and not something I’d thought much about before. In practice is it likely to be unbounded? In a typical computer system aren’t number formats typically bounded, and if so would we expect an AI system to be using bounded numbers even if the programmers forgot to explicitly bound the reward in the code?
But aren’t we explicitly talking about the AI changing it’s architecture to get more reward? So if it wants to optimize that number the most important thing to do would be to get rid of that arbitrary limit.
Yeah that’s what I’d like to know, would an AI built on a number format that has a default maximum pursue numbers higher than that maximum, or would it be “fulfilled” just by getting its reward number as high as the number format its using allows?
Whether or not an AI would want to wirehead would depend entirely on it’s ontology. Maximizing paperclips, maximizing the reward from paperclips, and maximizing the integer that tracks paperclips are 3 very different concepts, and depending on how the AI sees itself all 3 are plausible goals the AI could have, depending on it’s ontology. There’s no reason to suspect that one of those ontologies is more likely that I can see.
Even if the idea does have an ontology that maximizes the integer tracking paperclips, one then has to ask how time is factored into the equation. Is it better to be in the state of maximum reward for a longer period of time? Then the AI will want to ensure everything that could prevent it being in that is gone.
Finally, one has to consider how the integer itself works. Is it unbounded? If it is, then to maximize the reward the AI must use all matter and energy possible to store the largest possible version of that integer in memory.
Your last paragraph is really interesting and not something I’d thought much about before. In practice is it likely to be unbounded? In a typical computer system aren’t number formats typically bounded, and if so would we expect an AI system to be using bounded numbers even if the programmers forgot to explicitly bound the reward in the code?
But aren’t we explicitly talking about the AI changing it’s architecture to get more reward? So if it wants to optimize that number the most important thing to do would be to get rid of that arbitrary limit.
Yeah that’s what I’d like to know, would an AI built on a number format that has a default maximum pursue numbers higher than that maximum, or would it be “fulfilled” just by getting its reward number as high as the number format its using allows?
To me, this seems highly dependent on the ontology.