Pattern comments on A Semitechnical Introductory Dialogue on Solomonoff Induction

Pattern 20 Mar 2021 15:37 UTC
3 points
Contents
1. What credit is Poe due? (Without doing lots of research.)
2. “Rationality” and Neural networks
3. Poe’s revenge
4. Where does “Rationality” lie?
5. The Way
6. A basic statistic question
7. Not-learned knowns, and bodies versus heuristics
8. Is physics hard, if we’re good at moving?
9. Is Occam’s Razor a retcon? (And far too much else.)
10. A problem with Solomonoff Induction
11. Does anyone have the code for Solomonoff Induction or AIXI?

1. What credit is Poe due? (Without doing lots of research.)
On Poe’s argument (as presented here):
- Poe may be right about solving chess. His opinions concerning randomness, are interesting—while those issues may have been better worked out since to show that deterministic algorithms may do as well as algorithms using randomness, I think this was not known at the time.
Information theorist Claude Shannon argued in 1951 that it is not feasible for any computer to actually solve chess, since it would either need to compare some 10^120 possible game variations, or have a “dictionary” denoting an optimal move for each of the about 10^43 possible board positions.[4]*
-Wikipedia on Solving Chess
- Arguably his failure is conflating ‘finding the/an optimal solution’ (which proceeds from the rules), with ‘being good’. (Saying ‘an automaton can never do this’ seems obviously accurate if you note that the necessary computer would be too big to be an automaton. Shannon wrote in a time with better computers. While holding that ‘people follow deterministic rules therefore the necessary computation can fit inside a human body’ - which he might not have** - Poe might have maintained that your ‘Frankenstein monster’ (perhaps not the words of the time) is clearly a different type of thing than an automaton. And today, we are presently interested in neural networks—even the idea of simulating a person on ‘Babbage’s machine’ might not have occurred to Poe.**)
2. “Rationality” and Neural networks
Once we have the unbounded solution we understand, in some basic sense, the kind of work we are trying to perform, and then we can try to figure out how to do it efficiently.
ASHLEY: Which may well require new insights into the structure of the problem, or even a conceptual revolution in how we imagine the work we’re trying to do.
EY once argued against neural networks (possibly in the context of friendly AI?) - the disagreement may be concerning ‘solving problems magically’. (And as a means for beating people at chess, they did come later.) Today it would appear you might not need to possess:
- Correct knowledge of how brains (and minds) of people actually work
- Complete knowledge of how to play chess (find a chess engine)
in order to come up with a solution, if you have enough resources and the formalism/algorithms of neural networks. Just train something that’s good enough. Interestingly, this might mean a ‘rational approach’ i.e. one with a good theory might not be necessary for technically well-specified problems (like chess), though it may be important for friendly AI (which remains to be seen).

3. Poe’s revenge
and inventing good hypotheses from scratch.
So Solmonoff Induction includes randomness?

4. Where does “Rationality” lie?
You had to notice the resemblance to the Fibonacci rule to guess the next number.
Not consciously.

5. The Way
We just have no idea how Terence Tao works, so we can’t duplicate his abilities in a formal rule, no matter how much computing power that rule gets…
Simulate Tao’s brain. (Did the OP really resist this pun, or just not see it? It doesn’t fit with ‘figure out how to solve the problem’...)
Yes, as a real world solution, there would be issue—ethics, how do you even do that, are computers powerful enough, etc.

6. A basic statistic question
ASHLEY: But what if you can do better by forgetting more?
So you don’t overfit?
for one thing, you can always just do the same policy you would have used if you hadn’t seen that evidence.
This is great. (It also seems wrong for people, to some extent.)
With unlimited computing power, nothing goes wrong as a result of trying to process 4 gigabits per second; every extra bit just produces a better expected future prediction.
A little handwaving, but it’s clear it’s handwaving. (If I told you processing that amount of info at that speed would destroy the world (it probably doesn’t)***, you might disagree with ‘just produces a better prediction’. This is nitpicking at the level of ‘watch your wishes’, but unlimited computing power might be very destructive.)

7. Not-learned knowns, and bodies versus heuristics
ASHLEY: I note that there are some things I know that don’t come from my sensory inputs at all. Chimpanzees learn to be afraid of skulls and snakes much faster than they learn to be afraid of other arbitrary shapes. I was probably better at learning to walk in Earth gravity than I would have been at navigating in zero G. Those are heuristics I’m born with, based on how my brain was wired, which ultimately stems from my DNA specifying the way that proteins should fold to form neurons—not from any photons that entered my eyes later.
Swimming without having learned how is also an example, until it goes away. Learning to navigate better on Earth than in zero G (is an empirical claim), which might have more to do with the shape of the body, and the environment. That’s not ‘”heuristics” in thinking’ - that’s body design, etc.

8. Is physics hard, if we’re good at moving?
ASHLEY: Part of my mind feels like the laws of physics are quite complicated compared to going outside and watching a sunset. Like, I realize that’s false, but I’m not sure how to say out loud exactly why it’s false...
Perhaps our brains run a useful approximation? Neural networks may be more adapted to conditions than well, running such general formulas.
The language of physics is differential equations, and it turns out that this is something difficult to beat into some human brains,
Then how are we ‘good’ at moving? Like, at a level that seems hard to train/program ‘robots’ to do?
If pi is normal, then somewhere in its digits is a copy of Shakespeare’s Hamlet—but the number saying which particular digit of pi to start looking at, will be just about exactly as large as Hamlet itself.
It seems like the number would be longer. Like n^2 at least. (Unless you have a way of compressing it, which seems like it’d be hard to do.)
Similarly, the world Earth is much more algorithmically complex than the laws of physics.
Because it also includes constants?
ASHLEY: A probability distribution over possible 66-megabit frames? Like, a table with 266,000,000 entries, summing to 1?
Which is implicitly a model of the entire universe. (Sort of.)

9. Is Occam’s Razor a retcon? (And far too much else.)
The “entities” of a theory are its types, not its objects.
Did Occam mean that, or is this a retcon?
And Solomonoff induction tells us that this invocation of Occam’s Razor is flatly misguided because Occam’s Razor does not work like that.
This is a circular argument. That’s like saying ‘The world is round hypothesis tells us the world is round’, when that is part of (in fact the whole) of the hypothesis itself.
Some people like Levin search more than Solomonoff induction because it’s more computable. I dislike Levin search because (a) it has no fundamental epistemic justification and (b) it assigns probability zero to quantum mechanics.
People want the world to be simple. (The Simple World Fallacy, or The world I can understand (easily) fallacy?)
BLAINE: For example two, that Solomonoff induction outperforms even Terence Tao,
I’m glad this was eventually addressed, although I feel like this has the interpretability problem, except worse.
ASHLEY: So your basic argument is, “Never mind Terence Tao, Solomonoff induction dominates God.”
More like “Is God.” There might be some work on the flaws of this approach (SI/AIXI), even in theory, which seem immaterial prior to a switch being made to an approximation.
smarter entities can extract more info than is immediately apparent on the surface of things.
smarter, better calibrated, experts in the domain.… Arguably, Solomonoff Induction is rather stupid/low information. It generates all hypothesis (which tells you nothing), then it does work on those hypothesis (most of the information/‘smart’ in it), and it learns the rest. Shipping something with information about this universe seems more efficient. SI is supposed to get the most out of that information—that’s why it’s ‘an ideal’ - but it costs an infinite amount of energy, takes forever, etc.
you could look at which agents were seeing exact data like the data you got
Takes some work to find those agents.
In fact, you’re probably pointing at some particular shortcut and claiming nobody can ever figure that out using a reasonable amount of computing power
an unreasonable amount of computing power. Infinitely unreasonable.
just so that their mental simulation of the ideal answer isn’t running up against stupidity assertions.
There’s no reason in principle, that all types of minds will agree with you. What reason do you have to suppose humans will? (What reasonability guarantee is there?)
It sounds like “Jehovah placed rainbows in the sky as a sign that the Great Flood would never come again” is a ‘simple’ explanation; you can explain it to a child in nothing flat.
Because we don’t have SI/AIXI’s ‘flaw’ - it can never imagine a being such as itself.
and it sounds more alien and less intuitive than Jehovah.
Might just be a matter of the childhood. Would it be more intuitive to adults if it was explained to them as kids? (I’m going to call this ‘The Rainbow Religion’.)
but that doesn’t mean I should look at the historical role supposedly filled by Abraham Lincoln, and look for simple mechanical rules that would account for the things Lincoln is said to have done.
a) Evolution
b) No, you should look for simple mechanical rules that would generate the story—why do you believe the person telling you the story? It’s the P(Observing A) not P(A).
to predict the modified-human entity that is Jehovah.
The supposed infinities involved might do the job. If AIXI/SI cannot imagine itself, then that’s probably handled. (I could be wrong about this, but maybe ‘Machines don’t believe in infinities.’)
it shouldn’t cost as much to postulate a similar kind of thing elsewhere!
Because the thing hasn’t been postulated in isolation, in SI it showed up in a universe. With a cause (ultimately the beginning of the universe). Easy reuse just requires the right sort of cause—an engineer, evolution, duplication, etc.
BLAINE: Well, but even if I was wrong that Solomonoff induction should make Jehovah seem very improbable, it’s still Solomonoff induction that says that the alternative hypothesis of ‘diffraction’ shouldn’t itself be seen as burdensome—even though diffraction might require a longer time to explain to a human, it’s still at heart a simple program.
ASHLEY: Hmm.
So this is a time spent computing problem? We spend too much time thinking about humans, not enough time thinking about rainbows? (Insufficient Rainbow Contemplation.) Arguably this is rational—which is more likely to kill you, a rainbow or a human?
ASHLEY: Got a list of the good advice you think is derivable?
BLAINE: Um. Not really, but off the top of my head:
Sounds like stuff learned from experience.
People were wrong about galaxies being a priori improbable because that’s not how Occam’s Razor works.
Or they assumed the universe was small.
If something seems “weird” to you but would be a consequence of simple rules that fit the evidence so far, well, there’s nothing in these explicit laws of epistemology that adds an extra penalty term for weirdness.
I think noticing confusion is important. (Retraining your intuitions might be useful though.)
Your epistemology shouldn’t have extra rules in it that aren’t needed to do Solomonoff induction or something like it, including rules like “science is not allowed to examine this particular part of reality”—
I’ve considered ‘infinities are impossible’ myself. The only problem is ‘What happens if you find an infinity?’ (That being said, I still think it might be a useful tool—if you figure out the limit of a function as the input goes to infinity, and nothing short of infinity will reach the limit, then you’ve got bounds on that function even when you don’t know the input.)

10. A problem with Solomonoff Induction
BLAINE: Well, it wouldn’t bite you in the form of repeatedly making wrong experimental predictions.
But it requires infinite resources to run, and can only simulate finite programs, whereas a universe where it could be run would be a universe it couldn’t simulate.
Which brings up that dangling question from before about modeling the effect that my actions and choices have on the environment, and whether, say, an agent that used Solomonoff induction would be able to correctly predict “If I drop an anvil on my head, my sequence of sensory observations will end.”
In theory that’s an empirical question, but without a hypercomputer it seems untestable.

11. Does anyone have the code for Solomonoff Induction or AIXI?
Solomonoff induction is the best formalized epistemology we have right now—
. Does anyone have the code for Solomonoff Induction or AIXI? One with bounds, that actually runs on computers?

Footnotes (from 1.)
*From the original paper:
This is conservative for our calculation since the machine would calculate out to checkmate, not resignation. However, even at this figure there will be 10^120 variations to be calculated from the initial position. A machine operating at the rate of one variation per micro-second would require over 10^90 years to calculate the first move!”
** Believing in souls might have tripped Poe up.
***Because eyeballs and brains seem to do fine. But how do they work?