Might be worth having those more often too; the last one was very popular, and had lots of questions that open threads don’t typically attract.
Because Turing machines are deterministic, AIXI cannot make hypotheses involving randomness. To model the fair coin, AIXI would come up with increasingly convoluted Turing machines, attempting to compress a bitstring that approaches Kolmogorov randomness as its length approaches infinity. Meanwhile, AIXI would be punished and rewarded randomly.
Just a naïve thought, but maybe it would come up with MWI fairly quickly because of this. (I can imagine this being a beisutsukai challenge – show a student radioactive decay, and see how long it takes them to come up with MWI.) A probabilistic one is probably better for the other reasons brought up, though.
To come up with MWI, it would have to conceive of different potentialities and then a probabilistic selection. I don’t know, I’m not seeing how deterministic Turing machines could model that.
Suppose in QM you have a wavefunction which recognizably evolves into a superposition of wavefunctions. I’ll write that psi0, the initial wavefunction, becomes m.psi’ + n.psi″, where m and n are coefficients, and psi’ and psi″ are basis wavefunctions.
Something slightly analogous to the MWI interpretation of this, could be seen in a Turing machine which started with one copy of a bitstring, PSI0, and which replaced it with M copies of the bitstring PSI’ and N copies of the bitstring PSI″. That would be a deterministic computation which replaces one world, the single copy of PSI0, with many worlds, the multiple copies of PSI’ and PSI″.
So it’s straightforward enough for a deterministic state machine to invent rules corresponding to a proliferation of worlds. In fact, in the abstract theory of computation, this is one of the standard ways to model nondeterministic computation—have a deterministic computation which deterministically produces all the possible paths that could be produced by the nondeterministic process.
However, the way that QM works, and thus the way that a MWI theory would have to work, is rather more complicated, because the coefficients are complex numbers, the probabilities (which one might suppose correspond to the number of copies of each world) are squares of the absolute values of those complex numbers, and probability waves can recombine and destructively interfere, so you would need worlds / bitstrings to be destroyed as well as created.
In particular, it seems that you couldn’t reproduce QM with a setup in which the only fact about each world / bitstring was the number of current copies—you need the “phase information” (angle in the complex plane) of the complex numbers, in order to know what the interference effects are. So your Turing machine’s representation of the state of the multiverse would be something like:
(complex coefficient associated with the PSI’ worlds) (list of M copies of the PSI’ bitstring); (complex coefficient associated with the PSI″ worlds) (list of N copies of the PSI″ bitstring) ; …
and the “lists of copies of worlds” would all be dynamically irrelevant, since the dynamics comes solely from recombining the complex numbers at the head of each list of copies. At each timestep, the complex numbers would be recomputed, and then the appropriate number of world-copies would be entered into each list.
But although it’s dynamically irrelevant, that list of copies of identical worlds is still performing a function, namely, it’s there to ensure that there actually are M out of every (M+N) observers experiencing worlds of type PSI’, and N out of every (M+N) observers experiencing worlds of type PSI″. If the multiverse representation was just
(complex coefficient of PSI’ world) (one copy of PSI’ world) ; (complex coefficient of PSI″ world) (one copy of PSI″ world) ; …
then all those complex numbers could still evolve according to the Schrodinger equation, but you would only have one observer seeing a PSI’ world, and one observer seeing a PSI″ world, and this is inconsistent with observation, where we see that some quantum events are more probable than others.
This is the well-known problem of recovering the Born probabilities, or justifying the Born probability rule—mentioned in several places in the QM Sequence—but expressed in the unusual context of bit-strings on a Turing tape.
(Incidentally, I have skipped over the further problem that QM uses continuous rather than discrete quantities, because that’s not a problem of principle—you can just represent the complex numbers the way we do on real computers, to some finite degree of binary precision.)
… keep in mind that deterministic Turing machines can trivially simulate nondeterministic Turing machines.
The problem here seems to be one of notation. You are using nondeterministic Turing machine in the formal sense of the term, where Mitchell seems to be using nondeterministic closer to “has a source of random bits.”
Trivially? I was under the impression that it involved up to a polynomial slowdown, while probabilistic Turing machines can simulate deterministic Turing machines by merely having only a single probability of 1 for each component of its transition function.
Well, wouldn’t that be because it’s all theorizing about computational complexity?
I see the point. Pseudorandom number generators would be what you mean by simulation of nondeterminism in a DTM? Would a deterministic UTM with an RNG be sufficient for AIXI to hypothesize randomness? I still don’t see how SI would be able to hypothesize Turing machines that produce bitstrings that are probabilistically similar to the bitstring it is “supposed” to replicate.
Might be worth having those more often too; the last one was very popular, and had lots of questions that open threads don’t typically attract.
Just a naïve thought, but maybe it would come up with MWI fairly quickly because of this. (I can imagine this being a beisutsukai challenge – show a student radioactive decay, and see how long it takes them to come up with MWI.) A probabilistic one is probably better for the other reasons brought up, though.
Someone want to start one day after tomorrow? Run monthly or something? Let’s see what happens.
To come up with MWI, it would have to conceive of different potentialities and then a probabilistic selection. I don’t know, I’m not seeing how deterministic Turing machines could model that.
Do you see how a nondeterministic Turing machine could model that?
If so … … …
Suppose in QM you have a wavefunction which recognizably evolves into a superposition of wavefunctions. I’ll write that psi0, the initial wavefunction, becomes m.psi’ + n.psi″, where m and n are coefficients, and psi’ and psi″ are basis wavefunctions.
Something slightly analogous to the MWI interpretation of this, could be seen in a Turing machine which started with one copy of a bitstring, PSI0, and which replaced it with M copies of the bitstring PSI’ and N copies of the bitstring PSI″. That would be a deterministic computation which replaces one world, the single copy of PSI0, with many worlds, the multiple copies of PSI’ and PSI″.
So it’s straightforward enough for a deterministic state machine to invent rules corresponding to a proliferation of worlds. In fact, in the abstract theory of computation, this is one of the standard ways to model nondeterministic computation—have a deterministic computation which deterministically produces all the possible paths that could be produced by the nondeterministic process.
However, the way that QM works, and thus the way that a MWI theory would have to work, is rather more complicated, because the coefficients are complex numbers, the probabilities (which one might suppose correspond to the number of copies of each world) are squares of the absolute values of those complex numbers, and probability waves can recombine and destructively interfere, so you would need worlds / bitstrings to be destroyed as well as created.
In particular, it seems that you couldn’t reproduce QM with a setup in which the only fact about each world / bitstring was the number of current copies—you need the “phase information” (angle in the complex plane) of the complex numbers, in order to know what the interference effects are. So your Turing machine’s representation of the state of the multiverse would be something like:
(complex coefficient associated with the PSI’ worlds) (list of M copies of the PSI’ bitstring); (complex coefficient associated with the PSI″ worlds) (list of N copies of the PSI″ bitstring) ; …
and the “lists of copies of worlds” would all be dynamically irrelevant, since the dynamics comes solely from recombining the complex numbers at the head of each list of copies. At each timestep, the complex numbers would be recomputed, and then the appropriate number of world-copies would be entered into each list.
But although it’s dynamically irrelevant, that list of copies of identical worlds is still performing a function, namely, it’s there to ensure that there actually are M out of every (M+N) observers experiencing worlds of type PSI’, and N out of every (M+N) observers experiencing worlds of type PSI″. If the multiverse representation was just
(complex coefficient of PSI’ world) (one copy of PSI’ world) ; (complex coefficient of PSI″ world) (one copy of PSI″ world) ; …
then all those complex numbers could still evolve according to the Schrodinger equation, but you would only have one observer seeing a PSI’ world, and one observer seeing a PSI″ world, and this is inconsistent with observation, where we see that some quantum events are more probable than others.
This is the well-known problem of recovering the Born probabilities, or justifying the Born probability rule—mentioned in several places in the QM Sequence—but expressed in the unusual context of bit-strings on a Turing tape.
(Incidentally, I have skipped over the further problem that QM uses continuous rather than discrete quantities, because that’s not a problem of principle—you can just represent the complex numbers the way we do on real computers, to some finite degree of binary precision.)
… keep in mind that deterministic Turing machines can trivially simulate nondeterministic Turing machines.
The problem here seems to be one of notation. You are using nondeterministic Turing machine in the formal sense of the term, where Mitchell seems to be using nondeterministic closer to “has a source of random bits.”
Trivially? I was under the impression that it involved up to a polynomial slowdown, while probabilistic Turing machines can simulate deterministic Turing machines by merely having only a single probability of 1 for each component of its transition function.
Algorithmically trivially, I didn’t see anyone concerned about running times.
Well, wouldn’t that be because it’s all theorizing about computational complexity?
I see the point. Pseudorandom number generators would be what you mean by simulation of nondeterminism in a DTM? Would a deterministic UTM with an RNG be sufficient for AIXI to hypothesize randomness? I still don’t see how SI would be able to hypothesize Turing machines that produce bitstrings that are probabilistically similar to the bitstring it is “supposed” to replicate.