From the formal description of the algorithm, it looks like you use a universal prior to pick k, and then allow the kth Turing machine to run for ℓ steps, but don’t penalize the running time of the machine that outputs k. Is that right? That didn’t match my intuitive understanding of the algorithm, and seems like it would lead to strange outcomes, so I feel like I’m misunderstanding.
Yes this is correct. If you use the same bijection consistently from strings to natural numbers, it looks a little more intuitive than if you don’t. The universal prior picks k (the number) by outputting k as a string. The kth Turing machine is the Turing machine described by k as a string. So you end up looking at the Kolmogorov complexity of the description of the Turing machine. So the construction of the description of the world-model isn’t time-penalized. This doesn’t change the asymptotic result, so I went with the more familiar K(x) rather than translating this new speed prior into measure over finite strings, which would require some more exposition, but I agree with you it feels like there might be some strange outcomes “before the limit” as a result of this approach: namely, the code on the UTM that outputs the description of the world-model-Turing-machine will try to do as much of the computation as possible in advance, by computing the description of an speed-optimized Turing machine for when the actions start coming.
The other reasonable choices here instead of K(x) are S(x) (constructed to be like the new speed prior here) and—ℓ(x)the length of x. But ℓ(x) basically tells you that a Turing machine with fewer states is simpler, which would lead to a measure over H∞ that is dominated by world-models that are just universal Turing machines, which defeats the purpose of doing maximum a posteriori instead of a Bayes mixture. The way this issue appears in the proof renders the Natural Prior Assumption less plausible.
This invalidates some of my other concerns, but also seems to mean things are incredibly weird at finite times. I suspect that you’ll want to change to something less extreme here.
(I might well be misunderstanding something, apologies in advance.)
Suppose the “intended” physics take at least 1E15 steps to run on the UTM (this is a conservative lower bound, since you have to simulate the human for the whole episode). And suppose β<0.999 (I think you need β much lower than this). Then the intended model gets penalized by at least exp(1E12) for its slowness.
For almost the same description complexity, I could write down physics + “precompute the predictions for the first N episodes, for every sequence of possible actions/observations, and store them in a lookup table.” This increases the complexity by a few bits, some constant plus K(N|physics), but avoids most of the computation. In order for the intended physics to win, i.e. in order for the “speed” part of the speed prior to do anything, we need the complexity of this precomputed model to be at least 1E12 bits higher than the complexity of the fast model.
That appears to happen only once N > BB(1E12). Does that seem right to you?
We could talk about whether malign consequentialists also take over at finite times (I think they probably do, since the “speed” part of the speed prior is not doing any work until after BB(1E12) steps, long after the agent becomes incredibly smart), but it seems better to adjust the scheme first.
Using the speed prior seems more reasonable, but I’d want to know which version of the speed prior and which parameters, since which particular problem bites you will depend on those choices. And maybe to save time, I’d want to first get your take on whether the proposed version is dominated by consequentialists at some finite time.
Yes. I recall thinking about precomputing observations for various actions in this phase, but I don’t recall noticing how bad the problem was not in the limit.
your take on whether the proposed version is dominated by consequentialists at some finite time.
This goes in the category of “things I can’t rule out”. I say maybe 1⁄5 chance it’s actually dominated by consequentialists (that low because I think the Natural Prior Assumption is still fairly plausible in its original form), but for all intents and purposes, 1⁄5 is very high, and I’ll concede this point.
I’d want to know which version of the speed prior and which parameters
2−K(s)(1+ε) is a measure over binary strings. Instead, let’s try ∑p∈{0,1}∗:U(p)=s2−ℓ(p)βcT(U,p), where ℓ(p) is the length of p, T(U,p) is the time it takes to run p on U, and c is a constant. If there were no cleverer strategy than precomputing observations for all the actions, then c could be above |A|−md, where d is the number of episodes we can tolerate not having a speed prior for. But if it somehow magically predicted which actions BoMAI was going to take in no time at all, then c would have to be above 1/d.
I say maybe 1⁄5 chance it’s actually dominated by consequentialists
Do you get down to 20% because you think this argument is wrong, or because you think it doesn’t apply?
What problem do you think bites you?
What’s β? Is it O(1) or really tiny? And which value of c do you want to consider, polynomially small or exponentially small?
But if it somehow magically predicted which actions BoMAI was going to take in no time at all, then c would have to be above 1/d.
Wouldn’t they have to also magically predict all the stochasticity in the observations, and have a running time that grows exponentially in their log loss? Predicting what BoMAI will do seems likely to be much easier than that.
Do you get down to 20% because you think this argument is wrong, or because you think it doesn’t apply?
You argument is about a Bayes mixture, not a MAP estimate; I think the case is much stronger that consequentialists can take over a non-trivial fraction of a mixture. I think that the methods with consequentialists discover for gaining weight in the prior (before the treacherous turn) are mostly likely to be elegant (short description on UTM), and that is the consequentialists’ real competition; then [the probability the universe they live in produces them with their specific goals]or [the bits to directly specify a consequentialist deciding to to do this] set them back (in the MAP context).
I don’t see why their methods would be elegant. In particular, I don’t see why any of {the anthropic update, importance weighting, updating from the choice of universal prior} would have a simple form (simpler than the simplest physics that gives rise to life).
I don’t see how MAP helps things either—doesn’t the same argument suggest that for most of the possible physics, the simplest model will be a consequentialist? (Even more broadly, for the universal prior in general, isn’t MAP basically equivalent to a random sample from the prior, since some random model happens to be slightly more compressible?)
Yeah I think we have different intuitions here; are we at least within a few bits of log-odds disagreement? Even if not, I am not willing to stake anything on this intuition, so I’m not sure this is a hugely important disagreement for us to resolve.
I don’t see how MAP helps things either
I didn’t realize that you think that a single consequentialist would plausibly have the largest share of the posterior. I assumed your beliefs were in the neighborhood of:
it seems plausible that the weight of the consequentialist part is in excess of 1/million or 1/billion
(from your original post on this topic). In a Bayes mixture, I bet that a team of consequentialists that collectively amount to 1⁄10 or even 1⁄50 of the posterior could take over our world. In MAP, if you’re not first, you’re last, and more importantly, you can’t team up with other consequentialist-controlled world-models in the mixture.
Wouldn’t they have to also magically predict all the stochasticity in the observations, and have a running time that grows exponentially in their log loss?
From the formal description of the algorithm, it looks like you use a universal prior to pick k, and then allow the kth Turing machine to run for ℓ steps, but don’t penalize the running time of the machine that outputs k. Is that right? That didn’t match my intuitive understanding of the algorithm, and seems like it would lead to strange outcomes, so I feel like I’m misunderstanding.
Yes this is correct. If you use the same bijection consistently from strings to natural numbers, it looks a little more intuitive than if you don’t. The universal prior picks k (the number) by outputting k as a string. The kth Turing machine is the Turing machine described by k as a string. So you end up looking at the Kolmogorov complexity of the description of the Turing machine. So the construction of the description of the world-model isn’t time-penalized. This doesn’t change the asymptotic result, so I went with the more familiar K(x) rather than translating this new speed prior into measure over finite strings, which would require some more exposition, but I agree with you it feels like there might be some strange outcomes “before the limit” as a result of this approach: namely, the code on the UTM that outputs the description of the world-model-Turing-machine will try to do as much of the computation as possible in advance, by computing the description of an speed-optimized Turing machine for when the actions start coming.
The other reasonable choices here instead of K(x) are S(x) (constructed to be like the new speed prior here) and—ℓ(x)the length of x. But ℓ(x) basically tells you that a Turing machine with fewer states is simpler, which would lead to a measure over H∞ that is dominated by world-models that are just universal Turing machines, which defeats the purpose of doing maximum a posteriori instead of a Bayes mixture. The way this issue appears in the proof renders the Natural Prior Assumption less plausible.
This invalidates some of my other concerns, but also seems to mean things are incredibly weird at finite times. I suspect that you’ll want to change to something less extreme here.
(I might well be misunderstanding something, apologies in advance.)
Suppose the “intended” physics take at least 1E15 steps to run on the UTM (this is a conservative lower bound, since you have to simulate the human for the whole episode). And suppose β<0.999 (I think you need β much lower than this). Then the intended model gets penalized by at least exp(1E12) for its slowness.
For almost the same description complexity, I could write down physics + “precompute the predictions for the first N episodes, for every sequence of possible actions/observations, and store them in a lookup table.” This increases the complexity by a few bits, some constant plus K(N|physics), but avoids most of the computation. In order for the intended physics to win, i.e. in order for the “speed” part of the speed prior to do anything, we need the complexity of this precomputed model to be at least 1E12 bits higher than the complexity of the fast model.
That appears to happen only once N > BB(1E12). Does that seem right to you?
We could talk about whether malign consequentialists also take over at finite times (I think they probably do, since the “speed” part of the speed prior is not doing any work until after BB(1E12) steps, long after the agent becomes incredibly smart), but it seems better to adjust the scheme first.
Using the speed prior seems more reasonable, but I’d want to know which version of the speed prior and which parameters, since which particular problem bites you will depend on those choices. And maybe to save time, I’d want to first get your take on whether the proposed version is dominated by consequentialists at some finite time.
Yes. I recall thinking about precomputing observations for various actions in this phase, but I don’t recall noticing how bad the problem was not in the limit.
This goes in the category of “things I can’t rule out”. I say maybe 1⁄5 chance it’s actually dominated by consequentialists (that low because I think the Natural Prior Assumption is still fairly plausible in its original form), but for all intents and purposes, 1⁄5 is very high, and I’ll concede this point.
2−K(s)(1+ε) is a measure over binary strings. Instead, let’s try ∑p∈{0,1}∗:U(p)=s2−ℓ(p)βcT(U,p), where ℓ(p) is the length of p, T(U,p) is the time it takes to run p on U, and c is a constant. If there were no cleverer strategy than precomputing observations for all the actions, then c could be above |A|−md, where d is the number of episodes we can tolerate not having a speed prior for. But if it somehow magically predicted which actions BoMAI was going to take in no time at all, then c would have to be above 1/d.
What problem do you think bites you?
Do you get down to 20% because you think this argument is wrong, or because you think it doesn’t apply?
What’s β? Is it O(1) or really tiny? And which value of c do you want to consider, polynomially small or exponentially small?
Wouldn’t they have to also magically predict all the stochasticity in the observations, and have a running time that grows exponentially in their log loss? Predicting what BoMAI will do seems likely to be much easier than that.
You argument is about a Bayes mixture, not a MAP estimate; I think the case is much stronger that consequentialists can take over a non-trivial fraction of a mixture. I think that the methods with consequentialists discover for gaining weight in the prior (before the treacherous turn) are mostly likely to be elegant (short description on UTM), and that is the consequentialists’ real competition; then [the probability the universe they live in produces them with their specific goals]or [the bits to directly specify a consequentialist deciding to to do this] set them back (in the MAP context).
I don’t see why their methods would be elegant. In particular, I don’t see why any of {the anthropic update, importance weighting, updating from the choice of universal prior} would have a simple form (simpler than the simplest physics that gives rise to life).
I don’t see how MAP helps things either—doesn’t the same argument suggest that for most of the possible physics, the simplest model will be a consequentialist? (Even more broadly, for the universal prior in general, isn’t MAP basically equivalent to a random sample from the prior, since some random model happens to be slightly more compressible?)
Yeah I think we have different intuitions here; are we at least within a few bits of log-odds disagreement? Even if not, I am not willing to stake anything on this intuition, so I’m not sure this is a hugely important disagreement for us to resolve.
I didn’t realize that you think that a single consequentialist would plausibly have the largest share of the posterior. I assumed your beliefs were in the neighborhood of:
(from your original post on this topic). In a Bayes mixture, I bet that a team of consequentialists that collectively amount to 1⁄10 or even 1⁄50 of the posterior could take over our world. In MAP, if you’re not first, you’re last, and more importantly, you can’t team up with other consequentialist-controlled world-models in the mixture.
Let’s say β=0.9, c=1/20.
Oh yeah—that’s good news.
Although I don’t really like to make anything that would fall apart if the world were deterministic. Relying on stochasticity feels wrong to me.