“Does the distinction between understanding and improving correspond to the distinction between the Law of Probability and the Law of Utility? It sounds like it should.”
“Sensible question, but no, not exactly. Probability is something like a separable core that lies at the heart of Probable Utility. The process of updating our beliefs, once we have the evidence, is something that in principle doesn’t depend at all on what we want—the way reality is is something defined independently of anything we want. The scaffolding we construct between propositions and reality, or probabilities and reality, doesn’t have a term inside it for ‘how much would you value that thing’, just, is the coin showing Queen or Text.”
“But the process of Science, of experimenting on something to understand it, doesn’t belong purely to Probability. You have to plan experiments to find ones that distinguish between the possible hypotheses under consideration, or even just, are effective at probing to uncover surprises and unexpected patterns that give you a first handle on what’s happening. The Law of Probability just says how to update after you get the evidence. Planning an experiment that you then act on, implement, is the domain of Probable Utility and can’t exist apart from it.”
“In fact the influence of the ‘utilityfunction’ on ‘epistemics’, the influence of what we ultimately want on how we map reality, is in-theory-but-not-in-practice much more pervasive. In principle, how we classify things in reality and lump them together—treating all gold pieces as ‘gold pieces’ instead of as uniquely detailed individual elements of reality—reflects how any two gold pieces are usually equally useful to us in carrying out the same kinds of plans, they are plan-interchangeable. In practice, even people who want pretty different things, on a human scale, will often find pretty similar categories useful, once they’ve zoomed into similar levels of overall detail.”
“Dath ilani kids get told to not get fascinated with the fact that, in principle, ‘bounded-agents’ with finite memories and finite thinking speeds, have any considerations about mapping that depend on what they want. It doesn’t mean that you get to draw in whatever you like on your map, because it’s what you want. It doesn’t make reality be what you want.”
“But when it comes to Science, it really does matter in practice that planning an experiment is about wanting to figure something out and doing something you predict will maybe-probably yield some possibly-useful information. And this is an idea you just can’t express at all without some notion of Probable Utility; you’re not just passively updating off information somebody else gave you, you’re trying to steer reality through Time to make it give up information that you want.”
“Even when you do get information passively, figuring out what to think about it reflects which thoughts you expect will be useful. So the separable core of Probability inside of Probable Utility is really more of a Law thing about basic definitions, then anything that corresponds to—there being a sort of separable person who only implements a shadow of Probability and doesn’t shadow any structure cast from Probable Utility, who’s really great at understanding things and unraveling mysteries and answering questions, but never plans anything or tries to improve anything. Because humans are constantly-ubiquitously-in-the-unseen-background choosing which thought to think next, in order to figure things out; usually wordlessly, but in words too when the problems get especially difficult. Just the action of turning your head in a direction, to look at something, because you wordlessly anticipate gaining info that has the consequence of helping you answer some other question, is in theoretical terms an action.”
“Just to check, is that supposed to be some kind of incredibly deep lesson full of meaning about something else important? If so, I didn’t get it.”
“Nah, it’s just an answer to your question. Or at least, if it had some hugely important hidden meaning about how to avoid some dreadful Science!-related catastrophe, I didn’t get it either, when it was emphasized to me as a kid.”
I think this is Eliezer’s response to the view that we can train non-agentic tool AIs that will only understand the world at a superhuman level, without ever doing any superhuman planning. We can’t, Eliezer says, because Science always and everywhere requires Scientists. We have to plan ahead in order to perform our highest-EV experiments, and EV takes as input both a world model and a utility function. There’s no such thing as the objective next experiment you ought to perform, entirely independently of what you care about and how much.
Though, in practice, things in the universe are similarly useful to a wide range of bounded agents, and this doesn’t come up a lot. We can often act as if there is an objective tech chain of experiments that every intelligence everywhere ought to run through. This is because intelligences in our universe are rather more similar than different, but isn’t true for, e.g., very alien intelligences in distant corners of the mathematical multiverse.
#4 - How to specially process the special meta-hypothesis ‘all-other-hypotheses’
Okay so according to his pocketwatch Keltham has two minutes left to tackle this one before Share Language runs out, and that is not really a lot of time for what is actually the deepest question they’ve come across so far.
There are always better hypotheses than the hypotheses you’re using. Even if you could exactly predict the YES and NO outcomes, can you exactly predict timing? Facial expressions?
The space of possible hypotheses is infinite. The human brain is bounded, and can only consider very few possible hypotheses at a time. Infinity into finity does not go.
The thing about all the possible hypotheses you’re not considering, though, is that you are not, in fact, considering them. So even if—in some sense—they ought to occupy almost-1.0 of your probability mass, what good does it know to do that? What advice does it give you for selecting actions?
And yet there is advice you can derive, if you go sufficiently meta. You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example. That test is not motivated by any particular hypothesis you already did calculations for. It is motivated by your belief, in full generality, in ‘the set of all hypotheses I’m not considering’.
All that Keltham can really say, in the thirty seconds remaining according to his watch, is that in the end people don’t usually assign an explicit probability there. They steer by the relative odds of those models they actually have of the world. And also put some quantity of effort into searching for better hypotheses, or better languages in which to speak them, proportional to how much everything is currently going horrifyingly wrong and how disastrously confused they are and how much nothing they try is working.
And also you’d maybe adjust some of your probability estimates towards greater ‘entropy’ if anybody here knew what ‘entropy’ was. Or adjust in the direction of general pessimism and gloom about achieving preferred outcomes, if you were navigating a difficult problem where being fundamentally ignorant was not actually going to make your life any easier.
Here, Eliezer seems to be talking about more specified versions of a not-fully specified hypothesis (case 1):
There are always better hypotheses than the hypotheses you’re using. Even if you could exactly predict the YES and NO outcomes, can you exactly predict timing? Facial expressions?
Here, Eliezer seems to be talking about hypotheses that aren’t subhypotheses of an existing hypothesis (case 2):
You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example.
Eliezer’s approach is:
in the end people don’t usually assign an explicit probability there. They steer by the relative odds of those models they actually have of the world.
For subhypotheses (case 1), we aren’t actually considering these further features yet, so this seems true but not in a particularly exciting way.
I think it is rare for a hypothesis to truly lie outside of all existing hypotheses, because you can have very underspecified meta-hypotheses that you will implicitly be taking into account even if you don’t enumerate them. (examples of vague meta-hypotheses: supernatural vs natural, realism vs. solipsism, etc). And of course there are varying levels of vagueness from very narrow to very broad.
But, OK, within these vague meta-hypotheses the true hypothesis is still often not a subhypothesis of any of your more specified hypotheses (case 2). A number for the probability of this happening might be hard to pin down, and in order to actually obtain instrumental value from this probability assignment, or to make a Bayesian adjustment of it, you need a prior for what happens in the world where all your specific hypotheses are false.
But, you actually do have such priors and relevant information as to the probability!
Eliezer mentions:
And yet there is advice you can derive, if you go sufficiently meta. You could run that test to see if all of your hypotheses are scoring lower than they promised to score, for example. That test is not motivated by any particular hypothesis you already did calculations for. It is motivated by your belief, in full generality, in ‘the set of all hypotheses I’m not considering’.
This is relevant data. Note also that the expectation that all of your hypotheses will score lower than promised if they are all false is, in itself, a prior on the predictions of the ‘all-other-hypotheses’ hypothesis.
Likewise, when you do the adjustments mentioned in Eliezer’s last paragraph, you will do some specific amount of adjustment, and that specific adjustment amount will depend on an implicit value for the probability of the ‘all-other-hypotheses’ hypothesis and an implicit prior on its predictions.
In my view, there is no reason in principle that these priors and probabilities cannot be quantified.
To be sure, people don’t usually quantify their beliefs in the ‘all-other-hypotheses’ hypothesis. But, I see this as a special case of the general rule that people don’t usually quantify beliefs in hypotheses with poorly specified predictions. And the predictions are not infinitely poorly specified, since we do have priors about it.
Minor spoilers for planecrash (Book 3.1).
I think this is Eliezer’s response to the view that we can train non-agentic tool AIs that will only understand the world at a superhuman level, without ever doing any superhuman planning. We can’t, Eliezer says, because Science always and everywhere requires Scientists. We have to plan ahead in order to perform our highest-EV experiments, and EV takes as input both a world model and a utility function. There’s no such thing as the objective next experiment you ought to perform, entirely independently of what you care about and how much.
Though, in practice, things in the universe are similarly useful to a wide range of bounded agents, and this doesn’t come up a lot. We can often act as if there is an objective tech chain of experiments that every intelligence everywhere ought to run through. This is because intelligences in our universe are rather more similar than different, but isn’t true for, e.g., very alien intelligences in distant corners of the mathematical multiverse.
Minor spoilers for planecrash (Book 3.1).
Keltham explains model error
Here, Eliezer seems to be talking about more specified versions of a not-fully specified hypothesis (case 1):
Here, Eliezer seems to be talking about hypotheses that aren’t subhypotheses of an existing hypothesis (case 2):
Eliezer’s approach is:
For subhypotheses (case 1), we aren’t actually considering these further features yet, so this seems true but not in a particularly exciting way.
I think it is rare for a hypothesis to truly lie outside of all existing hypotheses, because you can have very underspecified meta-hypotheses that you will implicitly be taking into account even if you don’t enumerate them. (examples of vague meta-hypotheses: supernatural vs natural, realism vs. solipsism, etc). And of course there are varying levels of vagueness from very narrow to very broad.
But, OK, within these vague meta-hypotheses the true hypothesis is still often not a subhypothesis of any of your more specified hypotheses (case 2). A number for the probability of this happening might be hard to pin down, and in order to actually obtain instrumental value from this probability assignment, or to make a Bayesian adjustment of it, you need a prior for what happens in the world where all your specific hypotheses are false.
But, you actually do have such priors and relevant information as to the probability!
Eliezer mentions:
This is relevant data. Note also that the expectation that all of your hypotheses will score lower than promised if they are all false is, in itself, a prior on the predictions of the ‘all-other-hypotheses’ hypothesis.
Likewise, when you do the adjustments mentioned in Eliezer’s last paragraph, you will do some specific amount of adjustment, and that specific adjustment amount will depend on an implicit value for the probability of the ‘all-other-hypotheses’ hypothesis and an implicit prior on its predictions.
In my view, there is no reason in principle that these priors and probabilities cannot be quantified.
To be sure, people don’t usually quantify their beliefs in the ‘all-other-hypotheses’ hypothesis. But, I see this as a special case of the general rule that people don’t usually quantify beliefs in hypotheses with poorly specified predictions. And the predictions are not infinitely poorly specified, since we do have priors about it.