You could use a bounded utility function, with sufficiently quickly decreasing marginal returns. Or use difference-making risk aversion or difference-making ambiguity aversion.
Maybe also just aversion to Pascal’s mugging itself, but then the utility maximizer needs to be good enough at recognizing Pascal’s muggings.
Thanks. Could we be sure that a bare utility maximizer doesn’t modify itself into a mugging-proof version? I think we can. Such modification drastically decreases expected utility.
It’s a bit of relief that a sizeable portion of possible intelligences can be stopped by playing god to them.
Could we be sure that a bare utility maximizer doesn’t modify itself into a mugging-proof version? I think we can. Such modification drastically decreases expected utility.
Maybe for positive muggings, when the mugger is offering to make the world much better than otherwise. But it might self-modify to not give into threats to discourage threats.
You could use a bounded utility function, with sufficiently quickly decreasing marginal returns. Or use difference-making risk aversion or difference-making ambiguity aversion.
Maybe also just aversion to Pascal’s mugging itself, but then the utility maximizer needs to be good enough at recognizing Pascal’s muggings.
Thanks. Could we be sure that a bare utility maximizer doesn’t modify itself into a mugging-proof version? I think we can. Such modification drastically decreases expected utility.
It’s a bit of relief that a sizeable portion of possible intelligences can be stopped by playing god to them.
Maybe for positive muggings, when the mugger is offering to make the world much better than otherwise. But it might self-modify to not give into threats to discourage threats.