Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect
Approximate Bayesian Reasoning
AIXI is well-known model of a Bayesian reasoner. However, it’s not a practical one, since both the Solomonoff prior’s basis, Kolmogorov complexity, and the rest of the Solomonoff induction process of Bayesian reasoning as used by AIXI are uncomputable: they require unlimited computational resources. [Even the carefully more limited case of attempting to approximate using polynomial computational resources the polynomial-limited-Kolmogorov complexity of a program for outputting the object in polynomial time turns out to raise some fascinating issues in computational complexity theory.] Any real-world agent will need to figure out how to make do with finite computational resources, just as humans practitioners of STEM and rationalism do. This introduces a wide range of concerns into our approximate Bayesianism:
Missing hypotheses: it is quite likely that the true hypothesis is not even in the set of hypotheses that we are currently considering. About the best we can hope for is that one or more of the hypotheses we’re considering is a moderately good fit for reality over some range of world states, including ones in and perhaps close to the distribution of states that we’ve already encountered and used as evidence.
Inaccurate priors: the Solomonoff prior is uncomputable, cannot even be polynomially approximated reliably, and well-know arguments such as uniform priors, maximum entropy priors, and symmetry-group based measure priors are only accurately usable if you actually have the full hypothesis space available. There are ways to make better and worse guesses of priors, but we should reasonably expect that our prior was wrong, too high or too low (more likely too high because we were unaware of part of the hypothesis space). In practice, as we accumulate evidence, this problem gradually gets overcome by the weight of evidence, but in the meantime our posteriors will be wrong too. So a hypothesis in our hypothesis set basically represents no only itself, but also other hypotheses we have not yet considered that are similar enough to it (at least in some region) to not be significantly distinguishable under any evidence we yet have, and as and when we come up with such hypotheses, splitting its prior/posterior between them in some principled-looking way might be a reasonable choice of new prior. So for example, when adding Einstein’s Special Relativity to a set of hypotheses consisting only of classical physics, back in 1905, the set of observations on which they gave differing predictions was extremely small, principally the Michelson-Morley experiments.
Inaccurate Bayesian updates: often the likelihood is not accurately computable and we may have to make do with approximate estimates of it, leading to approximate Bayesian updates with increased uncertainties in the posterior. The likelihood may be harder or easier to compute under different hypotheses, so this may have differential effects across our set of hypotheses.
At the end of the day, the purpose of a set of approximately Bayesian beliefs is to let us make informed decisions as well as we can given our available computational resources. Generically, we have some number of available options that we need to choose between: these may be some mix of discrete choices or effectively-continuous parameters. To make the choice, we need to compute the utility of each available option, or for continuous parameters estimate a parameters-to-utility function, and then attempt to maximize the utility.
The Look-Elsewhere Effect and Regressional Goodhart
At this point we risk running into the Look-Elsewhere effect. Suppose, for simplicity, that each utility computation has an error distribution, that each of these distributions is exactly a normal distribution, that we had a good enough understanding of the error distribution in our approximately Bayesian reasoning process that we could actually estimate the standard deviation of each utility contribution (or estimate of the degree of bias will be zero: if we had a known degree of bias, we would have adjusted our utility computation to account for it), and that the utility computations for each option we were considering are all sufficiently complex and different a computations, based on the complexity of our remaining set of hypotheses, for their to errors to be independently distributed. (This is a fairly long set of fairly unlikely assumptions, but makes explaining and calculating the strength of the Look-Elsewhere effect a lot simpler.) Then, if we are comparing around 40 different options, we would expect around 5% of the utility calculations to be off by around two standard deviations, i.e. around two of them, each with a half chance of being either high or low. If the standard deviation sizes were all similar and actual differences in utility were very small compared to twice the standard deviation, then most likely we would end up picking an option because of an error in our utility calculation, likely whichever one we happened to be overestimating by around two standard deviations so it won. (This might help us learn more about its actual utility, improving our world model, in a form of multi-armed-bandit-like exploration, but generally isn’t a good way to make a decision.) If the standard deviation sizes of the options varied significantly, then we are more likely to pick one with a high standard deviation simply because its larger potential for error makes it more likely to be overestimated enough to stand out unjustifiably. This is a form of Goodhart’s law: in Scott Garrabrant’s Goodhart Taxonomy it falls in the “regressional Goodhart” category: we are selecting an optimum action on our estimate of the utility, which is correlated with but not identical to the true utility, so we end up selecting on the sum of the true utility plus our error. Our best estimate of the bias of the error is zero (otherwise we’d have corrected for it by adjusting our posteriors), but we may well have information about the likely size of the error, which is also correlated with our optimum via the Look-elsewhere effect: an option that has wide error bars on its utility has a higher chance of that being sufficiently overestimated to appear to be the optimum when it isn’t than one with narrow error bars does.
Compensating for Look-Elsewhere Regressional Goodhart
We can compensate for this effect: if we converted our uncertainty on the posteriors of hypotheses into an uncertainty on the utility, then subtracted two standard deviations from the mean estimated utility of each option, that would give us an estimated 95% lower confidence bound on the utility, and would compensate for the tendency to favor options with larger uncertainties on their utility out of 40 choices. Similarly, if we were picking from around 200,000 options under the same set of assumptions, likely around two of them would be five standard deviations from the norm, one in an overestimate direction, and the best way to compensate if they were all “in the running” would be to subtract five standards deviations from the estimated utility to estimate a 99.999% lower confidence bound. So we need to add some pessimizing over utility errors to compensate for regressional Goodhart’s law, before optimizing over the resulting lower confidence bound. If we are estimating the optimum of a continuous utility function of some set of continuous parameters, the number of options being picked between is not in fact infinite, as it might at first appear: for sufficiently nearby parameter values, the result and/or error of the utility function calculations will not be sufficiently independent to count as separate draws in this statistical process — while the effective number of draws may be hard to estimate, it is in theory deducible from the complexity of the form of the function, the and its dependence on the errors on the posteriors of the underlying hypotheses.
Note that the number of independent draws here is also effectively capped by the number of independent errors on posteriors of hypotheses in your hypothesis set — which for discrete hypotheses is just the number of hypotheses (whether this is the sum or the product of the number of world model hypotheses and number of utility function hypotheses depends on the complexity of how their errors interact: in general it will be up to the product). Helpfully, his is a number that only changes when you add hypotheses to you hypothesis set, or prune ones whose posterior has got close enough to 0.000… to not be worth including in the calculation. Much as before, if some of your hypotheses have posterior distributions over continuous parameters that form part of the hypothesis (as is very often the case), then the effective number if independent variables may be hard to estimate (and isn’t fixed), but is in theory defined by the form and complexity of the parameters to utilities mapping function.
In practice, life is harder than the set of simplifying assumptions we made three paragraphs above. The error distributions of utility calculations may well not be normal: normal distributions are convergent to due to the central limit theorem when you have a large number of statistically independent effects whose individual sizes are not too abnormally distributed (as could happen when gradually accumulating evidence about a hypothesis), but in practice distributions are often fatter tailed due to unlikely large effects (such as if some pieces of evidence are particularity dispositive, or if a prior was seriously flawed), or sometimes less fat tailed because there are only a small number of independent errors (such as if we don’t yet have much evidence). It’s also fairly unlikely that all of our available options are so different that their utility estimation errors are entirely independent: some causes of utility estimation error may be shared across many of them, and this may vary a lot between hypotheses and sets of options, making doing the overall calculation accurately very complex.
It’s also very unlikely that all of the options are so similar in estimated utility as to be “in the running” due to possible estimation errors. This issue can actually be approximately compensated for: if we have 200,000 independent options, plot their ±5 sigma error bars, look at the overlaps, and compute the number that are in the running to be the winner. Take that smaller number, covert it to a smaller number of sigmas according to the standard properties of the normal distribution, and recompute the now-smaller errors bars, and repeat. This process should converge rapidly to a pretty good estimate of the potential number of candidates that are actually “in the running”. If you have a rough estimate of the degree of correlation between their utility estimate errors, then you should combine that with this process to reach a lower number. If the number of candidates gets quite small, you can now redo the estimation of the correlation of their error estimates with a lot more detailed specifics. The way this process converges means that our initial estimate of isn’t very important: the only hard-to-estimate parameter with much effect is the degree of correlation between the errors of the (generally small) number of options that are close enough to be “in the running”.
So it is very hard to come up with a simple, exact, elegant algorithm to use here. Nevertheless, this calculation looks likely to be fairly tractable in practice, and several key elements can be observed that suggest a n umber of useful-looking and hopefully-computable heuristics:
It is useful to maintain not just an estimated posterior for each hypothesis, but some estimate of the error distribution that our only being able to approximate true Bayesianism has induced. A standard deviation would be a good start. Having some idea whether the distribution might be fatter tailed than an normal distribution would be particularly valuable: for example, it would be very helpful if we could maintain a list of separate 90%, 99%, 99.9%… estimated confidence bounds for each posterior, or alternatively confidence bounds for probabilities corresponding to small integer numbers of standard deviations from the mean for a normal distribution. (Using higher statistical moments such as the skewness and kurtsis would be another format for computing this in, but might be rather less convenient for doing the calculations we need to do.)
Over time, as Bayesianism operates, it tends to push posteriors of hypotheses either towards 99.999…% or 0.000…%, and also reduce the error bounds on them. So utility estimation errors will mostly be due to uncertain posteriors for hypotheses that are sufficiently new, or we otherwise have sufficiently little evidence about, that this hasn’t yet happened, where these hypothesis have a significant effect on the utility estimation compared to alternatives. So this entire mechanism can be very roughly approximated by putting less credence on bets whose value depends strongly on the correctness of hypotheses that we’re not yet very confident of, i.e. by generic skepticism, especially in cases where it looks like a lot of look-elsewhere-effect is likely to be at play in this specific decision.
Thus, always be on the look out for low-cost, low-risk ways to learn more about the way the world works and the way human utilities work and apply more evidence to our hypotheses on these things, especially in areas that might (or might not) lead to high utility states.
The harder we are optimizing, i.e. the more genuinely independent options we are calculating the utility of and picking between whose estimated utilities are close enough to the estimated optimum to be “in the running” compared to their error sises, the worse this form of Look-Elsewhere Regressional Goodhart’s law is. However, for normally distributed errors the strength of this effect as a multiple of the standard deviation of the utility error scales as the where is the number of options in the running and have independently distributed errors (or for a fatter tailed error distribution it might be more like ), both of which are very slow forms of growth, so even a rough, orders-of-magnitude estimate of the size of is actually very helpful in telling us how cautious we need to be. So we don’t need a detailed calculation of down to exact degrees of correlation, even a rough estimate is a big help.
If the suggested optimum world state we just came up with is outside the distribution of world states that we have a lot of evidence about the behavior of and correct utilities of, then we are also at risk of a related form of Goodhart often known as Extremal Goodhart: our chance of errors in our utility estimates is much higher in world-states outside the distribution that most of our Bayesian evidence was drawn from, primarily because we may be missing hypotheses from the set we are considering that produce extremely similar results within that distribution but differ in this new region of the state space. So, now would be an excellent time to see if we can come up with some new hypotheses, plausible under all currently seen evidence because they give similar predictions for them, but that would significantly affect this new region of the world states. In particular, we should be looking out for previously-unconsidered hypotheses that would suggest that either this world state is not reachable by the means we are currently planning, or it does not have as high a utility as we are currently estimating. Models of all kinds tend to break when you go out of distribution, and “here be dragons” hypotheses taht only take effect in some other part of the distribution are hard to completely discount a-priori if you don’t have much evidence (unless there is a strong argument from Occam’s razor that the form of such a difference would have to be very contrived-looking). We may also want also double-check that we’re calculating estimated utility value and their errors correctly in this new and unfamiliar part of world-state-space: calculational techniques can also have limited domains of applicability. So again, caution is rationally justified.
If several world states look close to being the optima, and are also easy to reach from each other or can be reached by similar paths, and we expect to be able to determine the true utility rapidly (ideally, even while we’re still approaching them), we may not have to make a final decision of destination now, and may be able to do “terminal guidance” on the way there, or “a fast-follow fine-tuning adjustment” soon after we get to one of them. Or maybe we can figure out a way to do some form of trial run, experiment, or pilot study to reduce risks and gather more information. We may have be a lot more options than this simplistic one-step model of the optimization process suggests, some of which involve gatehring more information to refine our approximately-Bayesian model of the world’s behavior and utility values.
Most of these heuristics should of course already be familiar to STEM practitioners, and any AI that uses approximate Bayesian reasoning should also incorporate them.
Why we Need Knightian Uncertainty as well a Risk in Approximate Bayesian Reasoning
Notice that in order to correctly compensate for Look-Elsewhere Regressional Goodhart’s law when doing approximate Bayesian reasoning, we need to keep more than just a Bayesian posterior for each hypothesis: we also need to be able to estimate error bars on it at various confidence levels. So (unlike the theoretical but uncomputable case of true Bayesian reasoning), we actually do need to distinguish between risk and Knightian uncertainty, because when choosing between large numbers of options we need to handle them differently to beat Look-Elsewhere Regressional Goodhart’s law. Risk can be due to a either a hypothesis that makes stochastic predictions, and/or a set of hypotheses that isn’t heavily dominated by just one with hypothesis a posterior that is close to 99.999…% Knightian uncertainty happens when not only do we have a set of hypotheses that isn’t heavily dominated by just one, but also some of those non-extremal posteriors have wide estimated error bars.
Note that the distinction between risk and Knightian uncertainty isn’t important when choosing to accept or reject a single bet (where there are only two options, and we know their probabilities of being the best option sum to 100%, so there is effectively only one independent variable being drawn here: the difference between the two utility estimates — and thus the distinction between risk and Knightian uncertainty disappears).[1] The distinction only becomes important when you have a large number of possibly-viable bets or options and you need to choose between them, when it discourages you from accepting bets whose favorability depends more strongly on things that you are Knightianly uncertain about compared to ones that are only risky, or nor even risky: not out of risk minimization, but due to needing to compensate for the look-elsewhere effect meaning that in the presence of Knightian uncertainty, your utility estimates on favorable-looking options have a predictable tendency for some to be overestimates which needs to be compensated for.
- ^
In Knightian uncertainty in a Bayesian framework and Knightian uncertainty: a rejection of the MMEU rule, So8res correctly demonstrated that for deciding whether to take a single bet there is no distinction between risk and Knightian uncertainty — and then suggested that this meant that Knightian uncertainty was a useless concept, unnecessary even for computationally-limited approximately Bayesian reasoners, which as I showed above is incorrect. The distinction between them matters only when there are multiple options on the table, and is basically only a second-order effect — risk and Knightian uncertainty with the same mean estimated probability are equivalent to first order, but the non-zero standard deviation of Knightian uncertainty has second-and-higher-order effects that differ from the zero standard deviation of a pure, accurately quantified risk, due to the need to compensate for the Look-elsewhere Regressional Goodhart effect.
- 17 Sep 2024 22:29 UTC; 10 points) 's comment on A Nonconstructive Existence Proof of Aligned Superintelligence by (
- 23 Jan 2024 11:17 UTC; 3 points) 's comment on A Shutdown Problem Proposal by (
- 16 Sep 2024 8:25 UTC; 2 points) 's comment on Open Problems in AIXI Agent Foundations by (
This line of reasoning is interesting and I think it deserves some empirical exploration, which could be done with modern LLMs and agents.
E.g. make a complicated process that generates a distribution of agents via RLHF on a variety of base models and a variety of RLHF datasets, and then test all those agents on some simple tests. Pick the best agents according to their mean average scores on those simple tests, and then vet them with some much more thorough tests.
I think such an experiment could be done more easily than that: simply apply standard Bayesian learning to a test set of observations and a large set of hypotheses, some of which are themselves probabilistic, yeilding a situation with both Knightian and statistical uncertainty, in which you would normally expect to be able to observe Regressional Goodhart/the Look-Elsewhere Efect. Repeat this, and confirm that that does indeed occur without this statistical adjustment, and then that applying this makes it go away (at least to second order).
However, I’m a little unclear why you feel the need to experimentally confirm a fairly well-known statistical technique: correctly compensating for the Look-Elsewhere Effect is standard procedure in the statistical analysis of experimental High-Energy Physics — which is of course a Bayesian learning process where you have both statistical uncertainty within individual hypotheses and Knightian uncertainty across alternative hypotheses, so exactly the situation in which this applies.