The intent was to add a hack that throws consistency to the wind, and observe that the AI doesn’t rebel against the hack.
Why doesn’t the AI reason “if I remove this hack, I’ll be more likely to win?” Because this is just a narrow chess AI and the programmer never gave it general reasoning abilities?
Why doesn’t the AI reason “if I remove this hack, I’ll be more likely to win?”
More interesting question is why it (if made capable of such reflection) would not take it a little step further and ponder what happens if it removes enemy’s queen from it’s internal board, which would also make it more likely to win, with its internal definition of win which is defined in terms of internal board.
Or why would anyone go through the bother of implementing possibly irreducible notion of what ‘win’ really means in the real world, given that this would simultaneously waste computing power on unnecessary explorations and make AI dangerous / uncontrollable.
Thing is, you don’t need to imagine the world dying to avoid making pointless likely impossible accomplishments.
Yeah, because it’s just a narrow real-world AI without philosophical tendencies… I’m actually not sure. A more precise argument would help, something like “all sufficiently powerful AIs will try to become or create consistent maximizers of expected utility, for such-and-such reasons”.
Does a pair of consistent optimizers with different goals have a tendency to become a consistent optimizer?
The problem with powerful non-optimizers seems to be that the “powerful” property already presupposes optimization power, and so at least one optimizer-like thing is present in the system. If it’s powerful enough and is not contained, it’s going to eat all the other tendencies of its environment, and so optimization for its goal will be all that remains. Unless there is another optimizer able to defend its non-conformity from the optimizer in question, in which case the two of them might constitute what counts as not-a-consistent-optimizer, maybe?
Why doesn’t the AI reason “if I remove this hack, I’ll be more likely to win?” Because this is just a narrow chess AI and the programmer never gave it general reasoning abilities?
More interesting question is why it (if made capable of such reflection) would not take it a little step further and ponder what happens if it removes enemy’s queen from it’s internal board, which would also make it more likely to win, with its internal definition of win which is defined in terms of internal board.
Or why would anyone go through the bother of implementing possibly irreducible notion of what ‘win’ really means in the real world, given that this would simultaneously waste computing power on unnecessary explorations and make AI dangerous / uncontrollable.
Thing is, you don’t need to imagine the world dying to avoid making pointless likely impossible accomplishments.
Yeah, because it’s just a narrow real-world AI without philosophical tendencies… I’m actually not sure. A more precise argument would help, something like “all sufficiently powerful AIs will try to become or create consistent maximizers of expected utility, for such-and-such reasons”.
Does a pair of consistent optimizers with different goals have a tendency to become a consistent optimizer?
The problem with powerful non-optimizers seems to be that the “powerful” property already presupposes optimization power, and so at least one optimizer-like thing is present in the system. If it’s powerful enough and is not contained, it’s going to eat all the other tendencies of its environment, and so optimization for its goal will be all that remains. Unless there is another optimizer able to defend its non-conformity from the optimizer in question, in which case the two of them might constitute what counts as not-a-consistent-optimizer, maybe?