The programs used in the proof basically work like this: they dovetail all programs and note when they halt/output, keeping track of how much algorithmic probability has been assigned to each string. Then a particular string is selected using a clever encoding scheme applied to the accrued algorithmic probability. So the gauge theory example would just be a special case of that, applied to the accumulated probability from programs in various gauges.
Thanks! I’d need more detail than that to answer my questions.
Like, can we specialize this new program to a program that’s just ‘dovetailing’ across all possible gauge-choices and then running physics on those? When we choose different gauges, we have to choose correspondingly-different ways of initializing the rest of the fields (or whatever); presumably this program is now also ‘dovetailing’ across all the different initializations?
But now it’s looking not just at one history, but at all histories, and “keeping track of how much algorithmic probability has been assigned to each”. So it presumably has to pick one of them to “predict”, but how is it doing this?
Like, I can vaguely understand the idea of (at any given timestep) partitioning the guage/initial-configuration pairs into clusters that haven’t been distinguished yet, and then sampling according to the size of the cluster. But presumably sampling isn’t good enough, because the random seed needed to hit a particular cluster of my choosing could be arbitrarily improbable (which doesn’t seem compatible with getting a difference-factor that’s constant in f).
So I still feel like I’m missing a key idea for constructing a hypothesis that’s allegedly significantly simpler than “you live in physics, once for every possible choice of gauge”.
(Or, well, I guess it’s simpler than “you live in physics, at thus-and-such a particular gauge”. It needn’t be simpler than the collection of hypotheses “you live in physics, once for every gauge”, and indeed, if the constant we’re talking about is (say) 10 bits, then it’s entirely possible for all but a thousandth-part of our probabliity mass to be concentrated in that collection. The fact that the amount of support reality provides for alt-complexity over K-complexity is bounded above by a constant that doesn’t depend on reality, doesn’t preclude different realities providing different amounts of evidence, and nor does it preclude our particular reality happening to provide quite a lot of evidence.)
(But it does sound like the alternative hypotheses are going to have a character something like “you live in a machine that crawls across all gauges, feeds them into physics, and then «does something clever»”. Which is a little more resolution!)
Like, can we specialize this new program to a program that’s just ‘dovetailing’ across all possible gauge-choices and then running physics on ?
You could specialize to just dovetailing across gauges, but this might make the program longer since you’d need to specify physics.
So it presumably has to pick one of them to “predict”, but how is it doing this?
I don’t think you need to choose a particular history to predict since all observables are gauge-invariant. So say we’re trying to predict some particular sequence of observables in the universe. You run your search over all programs, the search includes various choices of gauge/initialization. Since the observables are gauge-invariant each of those choices of gauge generate the same sequence of observables, so their algorithmic probability accumulates to a single string. Once the string has accumulated enough algorithmic probability we can refer to it with a short codeword(this is the tricky part, see below)
But presumably sampling isn’t good enough, because the random seed needed to hit a particular cluster of my choosing could be arbitrarily improbable
The idea is that, using a clever algorithm, you can arrange the clusters in a contiguous way inside [0,1]. Since they’re contiguous, if a particular cluster has volume 2−Θ(n) we can refer to it with a codeword of length Θ(n).
this might make the program longer since you’d need to specify physics.
(I doubt it matters much; the bits you use to specify physics at the start are bits you save when picking the codeword at the end.)
I don’t think you need to choose a particular history to predict since all observables are gauge-invariant.
IIUC, your choice of the wavefunction and your choice of the gauge are interlocked. The invariant is that if you change the gauge and twiddle the wavefunction in a particular way, then no observables change. If you’re just iterating over (gauge, wavefunction) pairs then (iiuc) you’re going to hit lots of pairs that aren’t making the correct predictions about the observables.
I don’t understand the mechanism that lets you can find some clever way to check which (gauge, wavefunction) pairs are legal (or at least haven’t-been-ruled-out-yet), nor how to make a deterministic prediction given an imperfect cluster without doing some sort of sampling.
(Alternatively, it’s plausible to me that I’m thinking about it wrong.)
… codeword …
Yeah, I see how if you can do this dovetailing and get the clusters to be continuous, such that every cluster with 2k programs of length (n+k) has a codeword of length n, then you’re done.
(And it’s clear to me how to do this if you can enumerate the 2k programs, and I know how to enumerate a superset of those programs b/c I know how to enumerate all the programs, but I don’t know how to computably distinguish members.)
I don’t understand the mechanism that lets you can find some clever way to check which (gauge, wavefunction) pairs are legal
I’m not really sure how gauge-fixing works in QM, so I can’t comment here. But I don’t think it matters: there’s no need to check which pairs are “legal”, you just run all possible pairs in parallel and see what observables they output. Pairs which are in fact gauge-equivalent will produce the same observables by definition, and will accrue probability to the same strings.
Perhaps you’re worried that physically-different worlds could end up contributing identical strings of observables? Possibly, but (a) I think if all possible strings of observables are the same they should be considered equivalent, so that could be one way of distinguishing them (b) this issue seems orthogonal to k-complexity VS. alt-complexity.
That gives me a somewhat clearer picture. (Thanks!) It sounds like the idea is that we have one machine that dovetails through everything and separates them into bins according to their behavior (as revealed so far), and a second machine that picks a bin.
Presumably the bins are given some sort of prefix-free code, so that when a behavior-difference is revealed within a bin (e.g. after more time has passed) it can be split into two bins, with some rule for which one is “default” (e.g., the leftmost).
I buy that something like this can probably be made to work.
In the case of physics, what it suggests the hypothesis
reality is the thing that iterates over all gauges, and applies physics to cluster #657
as competing with a cluster of longer hypotheses like
reality applies physics to gauge #25098723405890723450987098710987298743509
Where that first program beats most elements of the cluster (b/c the gauge-identifier is so dang long in general), but the cluster as a whole may still beat that first program (by at most a fixed constant factor, corresponding roughly to the length of the iterating machine).
(And my current state is something like: it’s interseting that you can think of yourself as living in one particular program, without losing more than a factor of «fixed constant» from your overall probability on what you saw. But I still don’t see why you’d want to, as opposed to realizing that you live in lots and lots of different programs.)
Presumably the bins are given some sort of prefix-free code, so that when a behavior-difference is revealed within a bin (e.g. after more time has passed) it can be split into two bins, with some rule for which one is “default
I only just realized that you’re mainly thinking of the complexity of semimeasures on infinite sequences, not the complexity of finite strings. I guess that should have been obvious from the OP; the results I’ve been citing are about finite strings. My bad! For semimeasures, this paper proves that there actually is a non-constant gap between the log-total-probability and description complexity. Instead the gap is bounded by the Kolmogorov complexity of the length of the sequences. This is discussed in section 4.5.4 of Li&Vitanyi.
I’m pretty confident that the set of compatible (gauge, wavefunction) pairs is computably enumerable, so I think that the coding theorem should apply.
There’s an insight that I’ve glimpsed—though I still haven’t checked the details—which is that we can guarantee that it’s possible to name the ‘correct’ (gauge, wavefunction) cluster without necessarily having to name any single gauge (as would be prohibatively expensive), by dovetailing all the (guage, wavefunction) pairs (in some representation where you can comptuably detect compatibility) and binning the compatible ones, and then giving a code that points to the desired cluster (where the length of that code goes according to cluster-size, which will in general be smaller than picking a full gauge).
This does depend on your ability to distinguish when two programs belong in the same cluster together, which I’m pretty sure can be done for (gauge, wavefunction) pairs (in some suitable representation), but which ofc can’t be done in general.
I usually think of gauge freedom as saying “there is a family of configurations that all produce the same observables”. I don’t think that gives a way to say some configurations are correct/incorrect. Rather some pairs of configurations are equivalent and some aren’t.
That said, I do think you can probably do something like the approach described to assign a label to each equivalence class of configurations and do your evolution in that label space, which avoids having to pick a gauge.
The programs used in the proof basically work like this: they dovetail all programs and note when they halt/output, keeping track of how much algorithmic probability has been assigned to each string. Then a particular string is selected using a clever encoding scheme applied to the accrued algorithmic probability. So the gauge theory example would just be a special case of that, applied to the accumulated probability from programs in various gauges.
Thanks! I’d need more detail than that to answer my questions.
Like, can we specialize this new program to a program that’s just ‘dovetailing’ across all possible gauge-choices and then running physics on those? When we choose different gauges, we have to choose correspondingly-different ways of initializing the rest of the fields (or whatever); presumably this program is now also ‘dovetailing’ across all the different initializations?
But now it’s looking not just at one history, but at all histories, and “keeping track of how much algorithmic probability has been assigned to each”. So it presumably has to pick one of them to “predict”, but how is it doing this?
Like, I can vaguely understand the idea of (at any given timestep) partitioning the guage/initial-configuration pairs into clusters that haven’t been distinguished yet, and then sampling according to the size of the cluster. But presumably sampling isn’t good enough, because the random seed needed to hit a particular cluster of my choosing could be arbitrarily improbable (which doesn’t seem compatible with getting a difference-factor that’s constant in f).
So I still feel like I’m missing a key idea for constructing a hypothesis that’s allegedly significantly simpler than “you live in physics, once for every possible choice of gauge”.
(Or, well, I guess it’s simpler than “you live in physics, at thus-and-such a particular gauge”. It needn’t be simpler than the collection of hypotheses “you live in physics, once for every gauge”, and indeed, if the constant we’re talking about is (say) 10 bits, then it’s entirely possible for all but a thousandth-part of our probabliity mass to be concentrated in that collection. The fact that the amount of support reality provides for alt-complexity over K-complexity is bounded above by a constant that doesn’t depend on reality, doesn’t preclude different realities providing different amounts of evidence, and nor does it preclude our particular reality happening to provide quite a lot of evidence.)
(But it does sound like the alternative hypotheses are going to have a character something like “you live in a machine that crawls across all gauges, feeds them into physics, and then «does something clever»”. Which is a little more resolution!)
You could specialize to just dovetailing across gauges, but this might make the program longer since you’d need to specify physics.
I don’t think you need to choose a particular history to predict since all observables are gauge-invariant. So say we’re trying to predict some particular sequence of observables in the universe. You run your search over all programs, the search includes various choices of gauge/initialization. Since the observables are gauge-invariant each of those choices of gauge generate the same sequence of observables, so their algorithmic probability accumulates to a single string. Once the string has accumulated enough algorithmic probability we can refer to it with a short codeword(this is the tricky part, see below)
The idea is that, using a clever algorithm, you can arrange the clusters in a contiguous way inside [0,1]. Since they’re contiguous, if a particular cluster has volume 2−Θ(n) we can refer to it with a codeword of length Θ(n).
(I doubt it matters much; the bits you use to specify physics at the start are bits you save when picking the codeword at the end.)
IIUC, your choice of the wavefunction and your choice of the gauge are interlocked. The invariant is that if you change the gauge and twiddle the wavefunction in a particular way, then no observables change. If you’re just iterating over (gauge, wavefunction) pairs then (iiuc) you’re going to hit lots of pairs that aren’t making the correct predictions about the observables.
I don’t understand the mechanism that lets you can find some clever way to check which (gauge, wavefunction) pairs are legal (or at least haven’t-been-ruled-out-yet), nor how to make a deterministic prediction given an imperfect cluster without doing some sort of sampling.
(Alternatively, it’s plausible to me that I’m thinking about it wrong.)
Yeah, I see how if you can do this dovetailing and get the clusters to be continuous, such that every cluster with 2k programs of length (n+k) has a codeword of length n, then you’re done.
(And it’s clear to me how to do this if you can enumerate the 2k programs, and I know how to enumerate a superset of those programs b/c I know how to enumerate all the programs, but I don’t know how to computably distinguish members.)
I’m not really sure how gauge-fixing works in QM, so I can’t comment here. But I don’t think it matters: there’s no need to check which pairs are “legal”, you just run all possible pairs in parallel and see what observables they output. Pairs which are in fact gauge-equivalent will produce the same observables by definition, and will accrue probability to the same strings.
Perhaps you’re worried that physically-different worlds could end up contributing identical strings of observables? Possibly, but (a) I think if all possible strings of observables are the same they should be considered equivalent, so that could be one way of distinguishing them (b) this issue seems orthogonal to k-complexity VS. alt-complexity.
That gives me a somewhat clearer picture. (Thanks!) It sounds like the idea is that we have one machine that dovetails through everything and separates them into bins according to their behavior (as revealed so far), and a second machine that picks a bin.
Presumably the bins are given some sort of prefix-free code, so that when a behavior-difference is revealed within a bin (e.g. after more time has passed) it can be split into two bins, with some rule for which one is “default” (e.g., the leftmost).
I buy that something like this can probably be made to work.
In the case of physics, what it suggests the hypothesis
as competing with a cluster of longer hypotheses like
Where that first program beats most elements of the cluster (b/c the gauge-identifier is so dang long in general), but the cluster as a whole may still beat that first program (by at most a fixed constant factor, corresponding roughly to the length of the iterating machine).
(And my current state is something like: it’s interseting that you can think of yourself as living in one particular program, without losing more than a factor of «fixed constant» from your overall probability on what you saw. But I still don’t see why you’d want to, as opposed to realizing that you live in lots and lots of different programs.)
I only just realized that you’re mainly thinking of the complexity of semimeasures on infinite sequences, not the complexity of finite strings. I guess that should have been obvious from the OP; the results I’ve been citing are about finite strings. My bad! For semimeasures, this paper proves that there actually is a non-constant gap between the log-total-probability and description complexity. Instead the gap is bounded by the Kolmogorov complexity of the length of the sequences. This is discussed in section 4.5.4 of Li&Vitanyi.
Cool, thanks.
I’m pretty confident that the set of compatible (gauge, wavefunction) pairs is computably enumerable, so I think that the coding theorem should apply.
There’s an insight that I’ve glimpsed—though I still haven’t checked the details—which is that we can guarantee that it’s possible to name the ‘correct’ (gauge, wavefunction) cluster without necessarily having to name any single gauge (as would be prohibatively expensive), by dovetailing all the (guage, wavefunction) pairs (in some representation where you can comptuably detect compatibility) and binning the compatible ones, and then giving a code that points to the desired cluster (where the length of that code goes according to cluster-size, which will in general be smaller than picking a full gauge).
This does depend on your ability to distinguish when two programs belong in the same cluster together, which I’m pretty sure can be done for (gauge, wavefunction) pairs (in some suitable representation), but which ofc can’t be done in general.
Thanks for the reference!
Seems good to edit the correction in the post, so readers know that in some cases it’s not constant.
How does this correctness check work?
I usually think of gauge freedom as saying “there is a family of configurations that all produce the same observables”. I don’t think that gives a way to say some configurations are correct/incorrect. Rather some pairs of configurations are equivalent and some aren’t.
That said, I do think you can probably do something like the approach described to assign a label to each equivalence class of configurations and do your evolution in that label space, which avoids having to pick a gauge.