I stupidly didn’t consider that kind of situation for some reason… Back to the drawing board I guess.
Though to be fair it would still come out ahead 51% of the time, and in a real world application it would probably choose to spend the penny, since it would expect to make choices similarly in the future, and that would help it come out ahead an even higher percent of the time.
But yes, a 51% chance of losing a penny for nothing probably shouldn’t be worth more than a 49% chance at saving a life for a penny. However allowing a large enough reward to outweigh a small enough probability means the system will get stuck in situations where it is pretty much guaranteed to lose, on the slim, slim chance that it could get a huge reward.
Caring only about the percent of the time you “win” seemed like a more rational solution but I guess not.
Though another benefit of this system could be that you could have weird utility functions. Like a rule that says any outcome where one life is saved is worth more than any amount of money lost. Or Asimov’s three laws of robotics, which wouldn’t work under an Expected Utility function since it would only care about the first law. This is allowed because in the end all that matters is which outcomes you prefer to which other outcomes. You don’t have to turn utilities into numbers and do math on them.
I stupidly didn’t consider that kind of situation for some reason… Back to the drawing board I guess.
Though to be fair it would still come out ahead 51% of the time, and in a real world application it would probably choose to spend the penny, since it would expect to make choices similarly in the future, and that would help it come out ahead an even higher percent of the time.
But yes, a 51% chance of losing a penny for nothing probably shouldn’t be worth more than a 49% chance at saving a life for a penny. However allowing a large enough reward to outweigh a small enough probability means the system will get stuck in situations where it is pretty much guaranteed to lose, on the slim, slim chance that it could get a huge reward.
Caring only about the percent of the time you “win” seemed like a more rational solution but I guess not.
Though another benefit of this system could be that you could have weird utility functions. Like a rule that says any outcome where one life is saved is worth more than any amount of money lost. Or Asimov’s three laws of robotics, which wouldn’t work under an Expected Utility function since it would only care about the first law. This is allowed because in the end all that matters is which outcomes you prefer to which other outcomes. You don’t have to turn utilities into numbers and do math on them.