Here’s another fun way to think about this—you can basically cast what’s wrong here as an information theory exercise.
Problem:
Spot the step where the following argument goes wrong:
Suppose I have a dataset of finitely many points arranged in a line. Now, suppose I fit a (reasonable) universal prior to that dataset, and compare two cases: learning a line and learning to memorize each individual datapoint.
In the linear case, there is only one way to implement a line.
In the memorization case, I can implement whatever I want on the other datapoints in an arbitrary way.
Thus, since there are more ways to memorize than to learn a line, there should be greater total measure on memorization than on learning the line.
Therefore, you’ll learn to memorize each individual datapoint rather than learning to implement a line.
Solution:
By the logic of the post, step 4 is the problem, but I think step 4 is actually valid. The problem is step 2: there are actually a huge number of different ways to implement a line! Not only are there many different programs that implement the line in different ways, I can also just take the simplest program that does so and keep on adding comments or other extraneous bits. It’s totally valid to say that the algorithm with the most measure across all ways of implementing it is more likely, but you have to actually include all ways of implementing it, including all the cases where many of those bits are garbage and aren’t actually doing anything.
By the logic of the post, step 4 is the problem, but I think step 4 is actually valid. The problem is step 2: there are actually a huge number of different ways to implement a line! Not only are there many different programs that implement the line in different ways, I can also just take the simplest program that does so and keep on adding comments or other extraneous bits.
Evan, I wonder how much your disagreement is engaging with OPs’ reasons. A draft of this post motivated the misprediction of both counting arguments as trying to count functions instead of parameterizations of functions; one has to consider the compressivity of the parameter-function map (many different internal parameterizations map to the same external behavior). Given that the authors actually agree that 2 is incorrect, does this change your views?
I would be much happier with that; I think that’s much more correct. Then, my objection would just be that at least the sort of counting arguments for deceptive alignment that I like are and always have been about parameterizations rather than functions. I agree that if you try to run a counting argument directly in function space it won’t work.
deceptive alignment that I like are and always have been about parameterizations rather than functions.
How can this be true, when you e.g. say there’s “only one saint”? That doesn’t make any sense with parameterizations due to internal invariances; there are uncountably many “saints” in parameter-space (insofar as I accept that frame, which I don’t really but that’s not the point here). I’d expect you to raise that as an obvious point in worlds where this really was about parameterizations.
And, as you’ve elsewhere noted, we don’t know enough about parameterizations to make counting arguments over them. So how are you doing that?
How can this be true, when you e.g. say there’s “only one saint”? That doesn’t make any sense with parameterizations due to internal invariances; there are uncountably many saints.
Because it was the transcript of a talk? I was trying to explain an argument at a very high level. And there’s certainly not uncountably many; in the infinite bitstring case there would be countably many, though usually I prefer priors that put caps on total computation such that there are only finitely many.
I’d expect you to raise that as an obvious point in worlds where this really was about parameterizations.
I don’t really appreciate the psychoanalysis here. I told you what I thought and think, and I have far more evidence about that than you do.
And, as you’ve elsewhere noted, we don’t know enough about parameterizations to make counting arguments over them. So how are you doing that?
As I’ve said, I usually try to take whatever the most realistic prior is that we can reason about at a high-level, e.g. a circuit prior or a speed prior.
Here’s another fun way to think about this—you can basically cast what’s wrong here as an information theory exercise.
Problem:
Solution:
By the logic of the post, step 4 is the problem, but I think step 4 is actually valid. The problem is step 2: there are actually a huge number of different ways to implement a line! Not only are there many different programs that implement the line in different ways, I can also just take the simplest program that does so and keep on adding comments or other extraneous bits. It’s totally valid to say that the algorithm with the most measure across all ways of implementing it is more likely, but you have to actually include all ways of implementing it, including all the cases where many of those bits are garbage and aren’t actually doing anything.
Evan, I wonder how much your disagreement is engaging with OPs’ reasons. A draft of this post motivated the misprediction of both counting arguments as trying to count functions instead of parameterizations of functions; one has to consider the compressivity of the parameter-function map (many different internal parameterizations map to the same external behavior). Given that the authors actually agree that 2 is incorrect, does this change your views?
I would be much happier with that; I think that’s much more correct. Then, my objection would just be that at least the sort of counting arguments for deceptive alignment that I like are and always have been about parameterizations rather than functions. I agree that if you try to run a counting argument directly in function space it won’t work.
See also discussion here.
How can this be true, when you e.g. say there’s “only one saint”? That doesn’t make any sense with parameterizations due to internal invariances; there are uncountably many “saints” in parameter-space (insofar as I accept that frame, which I don’t really but that’s not the point here). I’d expect you to raise that as an obvious point in worlds where this really was about parameterizations.
And, as you’ve elsewhere noted, we don’t know enough about parameterizations to make counting arguments over them. So how are you doing that?
Because it was the transcript of a talk? I was trying to explain an argument at a very high level. And there’s certainly not uncountably many; in the infinite bitstring case there would be countably many, though usually I prefer priors that put caps on total computation such that there are only finitely many.
I don’t really appreciate the psychoanalysis here. I told you what I thought and think, and I have far more evidence about that than you do.
As I’ve said, I usually try to take whatever the most realistic prior is that we can reason about at a high-level, e.g. a circuit prior or a speed prior.
FWIW I object to 2, 3, and 4, and maybe also 1.
Nabgure senzr gung zvtug or hfrshy:
Gurer’f n qvssrerapr orgjrra gur ahzore bs zngurzngvpny shapgvbaf gung vzcyrzrag n frg bs erdhverzragf naq gur ahzore bs cebtenzf gung vzcyrzrag gur frg bs erdhverzragf.
Fvzcyvpvgl vf nobhg gur ynggre, abg gur sbezre.
Gur rkvfgrapr bs n ynetr ahzore bs cebtenzf gung cebqhpr gur rknpg fnzr zngurzngvpny shapgvba pbagevohgrf gbjneqf fvzcyvpvgl.