The sequence on Infra-Bayesianism motivates the min (a.k.a. Murphy) part of its argmax min by wanting to establish lower bounds on utility — that’s a valid viewpoint. My own interest in Infra-Bayesianism comes from a different motivation: Murphy’s min encodes directly into Infra-Bayesian decision making the generally true, inter-related facts that 1) for an optimizer, uncertainty on the true world model injects noise into your optimization process which almost always makes the outcome worse 2) the optimizer’s curse usually results in you exploring outcomes whose true utility you had overestimated, so your regret is generally higher than you had expected 3) most everyday environments and situations are already highly optimized, so random perturbations of their state almost invariably make things worse. All of which justify pessimism and conservatism.
The problem with this argument, is that it’s only true when the utility of the current state is higher than the utility of the maximum-entropy equilibrium state of the environment (the state that increasingly randomizing it tends to move it towards due to the law of big numbers). In everyday situations this is almost always true—making random changes to a human body or in a city will almost invariable make things worse, for example. In most physical environments, randomizing them sufficiently (e.g. by raining meteorites on them, or whatever) will tend to reduce their utility to that of a blasted wasteland (the surface of the moon, for example, has pretty-much reached equilibrium under randomization-by-meteorites, and has a very low utility). However, it’s a general feature of human utility functions that there often can be states worse than the maximum-entropy equilibrium. If your environment is a 5-choice multiple-choice test whose utility is the score, the entropic equilibrium is random guessing which will score 20%, and there are chose-wrong-answer policies that score less than that, all the way down to 0% -- and partially randomizing away from one of those policies will make its utility increase towards 20%. Similarly, consider a field of anti-personnel mines left over from a war—as an array of death and dismemberment waiting to happen, randomizing it with meteorite impacts will clear some mines and make its utility better—since it starts off actually worse than a blasted wasteland. Or, if a very smart GAI working on alignment research informed you that it had devised an acausal attack that would convert all permanent hellworlds anywhere in the multiverse into blasted wastelands, your initial assumption would probably be that doing that would be a good thing (modulo questions about whether it was pulling your leg, or whether the inhabitants of the hellworld would consider it a hellworld).
In general, hellworlds (or at least local hell-landscapes) like this are rare — they have a negligible probability of arising by chance, so creating one requires work by an unaligned optimizer. So currently, with humans the strongest optimizers on the planet, they usually only arise in adversarial situations such as wars between groups of humans (“War is Hell”, as the saying goes). However, Infra-Bayesianism has exactly the wrong intuitions about any hell-environment whose utility is currently lower than that of the entropic equilibrium. If you have an environment that has been carefully optimized by a powerful very-non-aligned optimizer so as to maximize human suffering, then random sabotage such as throwing monkey wrenches in the works or assassinating the diabolical mastermind is actually very likely to improve things (from a human point of view), at least somewhat. Infra-Bayesianism would predict otherwise. I think having your GAI’s decision theory based on a system that gives exactly the wrong intuitions about hellworlds is likely to be extremely dangerous.
The solution to this would be what one might call Meso-Bayesianism—renormalize your utility scores so that of the utility of the maximal entropy state of the environment is by definition zero, and then assume that Murphy minimizes the absolute value of the utility towards the equilibrium utility of zero, not towards a hellworld. (I’m not enough of a pure mathematician to have any idea what this modification does to the network of proofs, other then making the utility renormalization part of a Meso-Bayesian update more complicated.) So then your decision theory understands that any unaligned optimizer trying to create a hellworld is also fighting Murphy, and when fighting them on their home turf Murphy is your ally, since “it’s easier to destroy than create” is also true of hellworlds. [Despite the usual formulation of Murphy’s law, I actually think the name ‘Murphy’ suits this particular metaphysical force better—Infra-Bayesianism’s original ‘Murphy’ might have been better named ‘Satan’, since it’s is wiling to go to any length to create a hellworld, hobbled only by some initially-unknown physical laws.]
The sequence on Infra-Bayesianism motivates the min (a.k.a. Murphy) part of its argmax min by wanting to establish lower bounds on utility — that’s a valid viewpoint. My own interest in Infra-Bayesianism comes from a different motivation: Murphy’s min encodes directly into Infra-Bayesian decision making the generally true, inter-related facts that 1) for an optimizer, uncertainty on the true world model injects noise into your optimization process which almost always makes the outcome worse 2) the optimizer’s curse usually results in you exploring outcomes whose true utility you had overestimated, so your regret is generally higher than you had expected 3) most everyday environments and situations are already highly optimized, so random perturbations of their state almost invariably make things worse. All of which justify pessimism and conservatism.
The problem with this argument, is that it’s only true when the utility of the current state is higher than the utility of the maximum-entropy equilibrium state of the environment (the state that increasingly randomizing it tends to move it towards due to the law of big numbers). In everyday situations this is almost always true—making random changes to a human body or in a city will almost invariable make things worse, for example. In most physical environments, randomizing them sufficiently (e.g. by raining meteorites on them, or whatever) will tend to reduce their utility to that of a blasted wasteland (the surface of the moon, for example, has pretty-much reached equilibrium under randomization-by-meteorites, and has a very low utility). However, it’s a general feature of human utility functions that there often can be states worse than the maximum-entropy equilibrium. If your environment is a 5-choice multiple-choice test whose utility is the score, the entropic equilibrium is random guessing which will score 20%, and there are chose-wrong-answer policies that score less than that, all the way down to 0% -- and partially randomizing away from one of those policies will make its utility increase towards 20%. Similarly, consider a field of anti-personnel mines left over from a war—as an array of death and dismemberment waiting to happen, randomizing it with meteorite impacts will clear some mines and make its utility better—since it starts off actually worse than a blasted wasteland. Or, if a very smart GAI working on alignment research informed you that it had devised an acausal attack that would convert all permanent hellworlds anywhere in the multiverse into blasted wastelands, your initial assumption would probably be that doing that would be a good thing (modulo questions about whether it was pulling your leg, or whether the inhabitants of the hellworld would consider it a hellworld).
In general, hellworlds (or at least local hell-landscapes) like this are rare — they have a negligible probability of arising by chance, so creating one requires work by an unaligned optimizer. So currently, with humans the strongest optimizers on the planet, they usually only arise in adversarial situations such as wars between groups of humans (“War is Hell”, as the saying goes). However, Infra-Bayesianism has exactly the wrong intuitions about any hell-environment whose utility is currently lower than that of the entropic equilibrium. If you have an environment that has been carefully optimized by a powerful very-non-aligned optimizer so as to maximize human suffering, then random sabotage such as throwing monkey wrenches in the works or assassinating the diabolical mastermind is actually very likely to improve things (from a human point of view), at least somewhat. Infra-Bayesianism would predict otherwise. I think having your GAI’s decision theory based on a system that gives exactly the wrong intuitions about hellworlds is likely to be extremely dangerous.
The solution to this would be what one might call Meso-Bayesianism—renormalize your utility scores so that of the utility of the maximal entropy state of the environment is by definition zero, and then assume that Murphy minimizes the absolute value of the utility towards the equilibrium utility of zero, not towards a hellworld. (I’m not enough of a pure mathematician to have any idea what this modification does to the network of proofs, other then making the utility renormalization part of a Meso-Bayesian update more complicated.) So then your decision theory understands that any unaligned optimizer trying to create a hellworld is also fighting Murphy, and when fighting them on their home turf Murphy is your ally, since “it’s easier to destroy than create” is also true of hellworlds. [Despite the usual formulation of Murphy’s law, I actually think the name ‘Murphy’ suits this particular metaphysical force better—Infra-Bayesianism’s original ‘Murphy’ might have been better named ‘Satan’, since it’s is wiling to go to any length to create a hellworld, hobbled only by some initially-unknown physical laws.]