The Outcome Pump resets the universe whenever a set period of time passes without an “Accept Outcome” button being pressed to prevent the reset.
This creates a universe where the Accept Outcome button gets pressed, not necessarily one that has a positive outcome. e.g. if the button was literally a button, something might fall on to it; or if it was a state in a computer, a cosmic ray might flip a bit.
True enough, but once we step outside of the thought experiment and take a look at the idea it is intended to represent, “button gets pressed” translates into “humanity gets convinced to accept the machine’s proposal”. Since the AI-analogue device has no motives or desires save to model the universe as perfectly as possible, P(A bit flips in the AI that leads to it convincing a human panel to do something bad) necessarily drops below P(A bit flips anywhere that leads to a human panel deciding to do something bad) and is discountable for the same reason why we ignore hypothesises like “Maybe a cosmic ray flipped a bit to make it do that?” when figuring out the source of computer errors in general.
P(A bit flips in the AI that leads to it convincing a human panel to do something bad) is always less than P(A bit flips anywhere that leads to a human panel deciding to do something bad), (the former is a subset of the latter).
The point of the cosmic ray statement is not so much that that might actually happen, but is just demonstrating that the Outcome-Pump-2.0-universe doesn’t necessarily result in a positive outcome, just that it is a universe that has had the “Outcome” accepted, and also that the Outcome being accepted doesn’t imply that the universe is one we like.
This creates a universe where the Accept Outcome button gets pressed, not necessarily one that has a positive outcome. e.g. if the button was literally a button, something might fall on to it; or if it was a state in a computer, a cosmic ray might flip a bit.
True enough, but once we step outside of the thought experiment and take a look at the idea it is intended to represent, “button gets pressed” translates into “humanity gets convinced to accept the machine’s proposal”. Since the AI-analogue device has no motives or desires save to model the universe as perfectly as possible, P(A bit flips in the AI that leads to it convincing a human panel to do something bad) necessarily drops below P(A bit flips anywhere that leads to a human panel deciding to do something bad) and is discountable for the same reason why we ignore hypothesises like “Maybe a cosmic ray flipped a bit to make it do that?” when figuring out the source of computer errors in general.
P(A bit flips in the AI that leads to it convincing a human panel to do something bad) is always less than P(A bit flips anywhere that leads to a human panel deciding to do something bad), (the former is a subset of the latter).
The point of the cosmic ray statement is not so much that that might actually happen, but is just demonstrating that the Outcome-Pump-2.0-universe doesn’t necessarily result in a positive outcome, just that it is a universe that has had the “Outcome” accepted, and also that the Outcome being accepted doesn’t imply that the universe is one we like.