Could we be sure that a bare utility maximizer doesn’t modify itself into a mugging-proof version? I think we can. Such modification drastically decreases expected utility.
Maybe for positive muggings, when the mugger is offering to make the world much better than otherwise. But it might self-modify to not give into threats to discourage threats.
Maybe for positive muggings, when the mugger is offering to make the world much better than otherwise. But it might self-modify to not give into threats to discourage threats.