This post contains an excerpt from a dialogue by Eliezer about why Solomonoff induction is a good answer to “how to do epistemology using infinite computing power”. I’m link-posting it from Arbital because I found it really useful and wish I’d seen it on Less Wrong earlier. (Edit: it’s now been cross-posted in full to LessWrong.) Eliezer covers a wide range of arguments and objections in a very accessible and engaging way. It’s particularly interesting that, near the end of the dialogue, the two characters discuss an objection which feels very similar to what I was trying to convey in my post Against Strong Bayesianism—specifically my argument that
An ideal bayesian is not thinking in any reasonable sense of the word—instead, it’s simulating every logically possible universe. By default, we should not expect to learn much about thinking based on analysing a different type of operation that just happens to look the same in the infinite limit.
Given that both of the characters agree with the version of this argument made in the Arbital dialogue, I guess that my position is closer to Eliezer’s than I previously thought. I suspect that the remaining disagreement is something like: given these problems, is it better to aim for “some future formalism that’s better than Solomonoff induction”, or instead to focus on thinking about how intelligence actually functions in practice? Reading this Arbital post has moved me slightly towards MIRI’s position, mainly because it’s evidence that Eliezer had considered this specific argument several years ago. However, I’m still more excited about the latter—in part because it seems that logical inductors are vulnerable to a similar type of objection as Solomonoff induction. The following excerpt is the (relatively small) section of the original dialogue which focuses on this type of objection.
Excerpt starts here, with Ashley (a fictional computer scientist) explaining one source of her skepticism about Solomonoff induction.
Ashley: The ‘language of thought’ or ‘language of epistemology’ seems to be different in some sense from the ‘language of computer programs’. Like, when I think about the laws of Newtonian gravity, or when I think about my Mom, it’s not just one more line of code tacked onto a big black-box computer program. It’s more like I’m crafting an explanation with modular parts—if it contains a part that looks like Newtonian mechanics, I step back and reason that it might contain other parts with differential equations. If it has a line of code for a Mom, it might have a line of code for a Dad. I’m worried that if I understood how humans think like that, maybe I’d look at Solomonoff induction and see how it doesn’t incorporate some further key insight that’s needed to do good epistemology.
Blaine: Solomonoff induction literally incorporates a copy of you thinking about whatever you’re thinking right now.
Ashley: Okay, great, but that’s inside the system. If Solomonoff learns to promote computer programs containing good epistemology, but is not itself good epistemology, then it’s not the best possible answer to “How do you compute epistemology?” Like, natural selection produced humans but population genetics is not an answer to “How does intelligence work?” because the intelligence is in the inner content rather than the outer system. In that sense, it seems like a reasonable worry that Solomonoff induction might incorporate only some principles of good epistemology rather than all the principles, even if the internal content rather than the outer system might bootstrap the rest of the way.
Blaine: Hm. If you put it that way...
(long pause)
Blaine: …then, I guess I have to agree. I mean, Solomonoff induction doesn’t explicitly say anything about, say, the distinction between analytic propositions and empirical propositions, and knowing that is part of good epistemology on my view. So if you want to say that Solomonoff induction is something that bootstraps to good epistemology rather than being all of good epistemology by itself, I guess I have no choice but to agree. I do think the outer system already contains a lot of good epistemology and inspires a lot of good advice all on its own. Especially if you give it credit for formally reproducing principles that are “common sense”, because correctly formalizing common sense is no small feat.
Ashley: Got a list of the good advice you think is derivable?
Blaine: Um. Not really, but off the top of my head:
The best explanation is the one with the best mixture of simplicity and matching the evidence.
“Simplicity” and “matching the evidence” can both be measured in bits, so they’re commensurable.
The simplicity of a hypothesis is the number of bits required to formally specify it, for example as a computer program.
When a hypothesis assigns twice as much probability to the exact observations seen so far as some other hypothesis, that’s one bit’s worth of relatively better matching the evidence.
You should actually be making your predictions using all the explanations, not just the single best one, but explanations that poorly match the evidence will drop down to tiny contributions very quickly.
Good explanations lets you compress lots of data into compact reasons which strongly predict seeing just that data and no other data.
Logic can’t dictate prior probabilities absolutely, but if you assign probability less than 2−1,000,000 to the prior that mechanisms constructed using a small number of objects from your universe might be able to well predict that universe, you’re being unreasonable.
So long as you don’t assign infinitesimal prior probability to hypotheses that let you do induction, they will very rapidly overtake hypotheses that don’t.
It is a logical truth, not a contingent one, that more complex hypotheses must in the limit be less probable than simple ones.
Epistemic rationality is a precise art with no user-controlled degrees of freedom in how much probability you ideally ought to assign to a belief. If you think you can tweak the probability depending on what you want the answer to be, you’re doing something wrong.
Things that you’ve seen in one place might reappear somewhere else.
Once you’ve learned a new language for your explanations, like differential equations, you can use it to describe other things, because your best hypotheses will now already encode that language.
We can learn meta-reasoning procedures as well as object-level facts by looking at which meta-reasoning rules are simple and have done well on the evidence so far.
So far, we seem to have no a priori reason to believe that universes which are more expensive to compute are less probable.
People were wrong about galaxies being a priori improbable because that’s not how Occam’s Razor works. Today, other people are equally wrong about other parts of a continuous wavefunction counting as extra entities.
If something seems “weird” to you but would be a consequence of simple rules that fit the evidence so far, well, there’s no term in these explicit laws of epistemology which add an extra penalty term for weirdness.
Your epistemology shouldn’t have extra rules in it that aren’t needed to do Solomonoff induction or something like it, including rules like “science is not allowed to examine this particular part of reality”--
Ashley: This list isn’t finite, is it.
Blaine: Well, there’s a lot of outstanding debate about epistemology where you can view that debate through the lens of Solomonoff induction and see what Solomonoff suggests.
Ashley: But if you don’t mind my stopping to look at your last item, #17 above—again, it’s attempts to add completeness clauses to Solomonoff induction that make me the most nervous. I guess you could say that a good rule of epistemology ought to be one that’s promoted by Solomonoff induction—that it should arise, in some sense, from the simple ways of reasoning that are good at predicting observations. But that doesn’t mean a good rule of epistemology ought to explicitly be in Solomonoff induction or it’s out.
Blaine: Can you think of good epistemology that doesn’t seem to be contained in Solomonoff induction? Besides the example I already gave of distinguishing logical propositions from empirical ones.
Ashley: I’ve been trying to. First, it seems to me that when I reason about laws of physics and how those laws of physics might give rise to higher levels of organization like molecules, cells, human beings, the Earth, and so on, I’m not constructing in my mind a great big chunk of code that reproduces my observations. I feel like this difference might be important and it might have something to do with ‘good epistemology’.
Blaine: I guess it could be? I think if you’re saying that there might be this unknown other thing and therefore Solomonoff induction is terrible, then that would be the nirvana fallacy. Solomonoff induction is the best formalized epistemology we have right now--
Ashley: I’m not saying that Solomonoff induction is terrible. I’m trying to look in the direction of things that might point to some future formalism that’s better than Solomonoff induction.
Ashley then goes on to raise a number of ideas associated with embedded agency and logical induction; see the original post for more.
(IIRC, that dialogue is basically me-written.)
I’m not sure if this comment goes best here, or in the “Against Strong Bayesianism” post. But I’ll put it here, because this is fresher.
I think it’s important to be careful when you’re taking limits.
I think it’s true that “The policy that would result from a naive implementation of Solomonoff induction followed by expected utility maximization, given infinite computing power, is the ideal policy, in that there is no rational process (even using arbitrarily much computing power) that leads to a policy that beats it.”
But say somebody offered you an arbitrarily large-and-fast, but still finite, computer. That is to say, you’re allowed to ask for a google-plex operations per second and a google-plex RAM, or even Graham’s number of each, but you have to name a number then live with it. The above statement does NOT mean that the program you should run on that hyper-computer is a naive implementation of Solomonoff induction. You would still want to use the known tricks for improving the efficiency of Bayesian approximations; that is, things like MCMC, SMC, efficient neural proposal distributions with importance-weighted sampling, efficient pruning of simulations to just the parts that are relevant for predicting input (which, in turn, includes all kinds of causality logic), smart allocation of computational resources between different modes and fallbacks, etc. Such tricks — even just the ones we have already discovered — look a lot more like “intelligence” than naive Solomonoff induction does. Even if, when appropriately combined, their limit as computation goes to infinity is the same as the limit of Solomonoff induction as computation goes to infinity
In other words, saying “the limit as amount-of-computation X goes to infinity of program A, strictly beats program B with amount Y of finite computation, for any B and Y”; or even “the limit as amount-of-computation X goes to infinity of program A, is as good or better than the limit as amount-of-computation Y goes to infinity of program B, for any B” … is true, but not very surprising or important, because it absolutely does not imply that “as computation X goes to infinity, program A with X resources beats program B with X resources, for any B”.
The problem with this definition is that it focusses too much on the details of the computational substrate. Suppose the programming language used has a built in function for matrix multiplication, and it is 2x as fast as any program that could be written within the language. Then any program that does its own matrix multiplication will be less intelligent than one that uses the built in functions.
“A with X resources beats program B with X resources, for any B” could be true if A is just B with the first few steps precomputed. It focusses too much on the little hackish tricks specific to the substrate.
Maybe say that two algorithms A, B are equivalent up to polynomial factors if there exists a polynomial p(x) so that A with p(x) compute beats B with X compute for all x, and likewise B with p(x) compute beats A with x compute.
Yes, your restatement feels to me like a clear improvement.
In fact, considering it, I think that if algorithm A is “truly more intelligent” than algorithm B, I’d expect if f(x) is the compute that it takes for B to perform as well or better than A, f(x) could even be super-exponential in x. Exponential would be the lower bound; what you’d get from a mere incremental improvement in pruning. From this perspective, anything polynomial would be “just implementation”, not “real intelligence”.
Seem just false. If you’re not worried about confronting agents of equal size (which is equally a concern for a Solomonoff inductor) then a naive bounded Solomonoff inductor running on a Grahamputer will give you essentially the same result for all practical purposes as a Solomonoff inductor. That’s far more than enough compute to contain our physical universe as a hypothesis. You don’t bother with MCMC on a Grahamputer.
If we’re positing a Grahamputer, then “yeah but it’s essentially the same if you’re not worried about agents of equal size” seems too loose.
In other words, with great compute power, comes great compute responsibility.