From my perspective reading this post, it read to me like “I didn’t understand the counting argument, therefore it doesn’t make sense” which is (obviously) not very compelling to me.
I definitely appreciate how it can feel frustrating or bad when you feel that someone isn’t properly engaging with your ideas. However, I also feel frustrated by this statement. Your comment seems to have a tone of indignation that Quintin and Nora weren’t paying attention to what you wrote.
I myself expected you to respond to this post with some ML-specific reasoning about simplicity and measure of parameterizations, instead of your speculation about a relationship between the universal measure and inductive biases. I spoke with dozens of people about the ideas in OP’s post, and none of them mentioned arguments like the one you gave. I myself have spent years in the space and am also not familiar with this particular argument about bitstrings.
(EDIT: Having read Ryan’s comment, it now seems to me that you have exclusively made a simplicity argument without any counting involved, and an empirical claim about the relationship between description length of a mesa objective and the probability of SGD sampling a function which implements such an objective. Is this correct?)
If these are your real reasons for expecting deceptive alignment, that’s fine, but I think you’ve mentioned this rather infrequently. Your profile links to How likely is deceptive alignment?, which is an (introductory) presentation you gave. In that presentation, you make no mention of Turing machines, universal semimeasures, bitstrings, and so on. On a quick search, the closest you seem to come is the following:
We’re going to start with simplicity. Simplicity is about specifying the thing that you want in the space of all possible things. You can think about simplicity as “How much do you have to aim to hit the exact thing in the space of all possible models?” How many bits does it take to find the thing that you want in the model space? And so, as a first pass, we can understand simplicity by doing a counting argument, which is just asking, how many models are in each model class?[1]
But this is ambiguous (as can be expected for a presentation at this level). We could view this as “bitlength under a given decoding scheme, viewing an equivalence class over parameterizations as a set of possible messages” or “Shannon information (in bits) of a function induced by a given probability distribution over parameterizations” or something else entirely (perhaps having to do with infinite bitstrings).
My critique is not “this was ambiguous.” My critique is “how was anyone supposed to be aware of the ‘real’ argument which I (and many others) seem to now be encountering for the first time?”.
My objection is that the sort of finite bitstring analysis in this post does not yield any well-defined mathematical object at all, and certainly not one that would predict generalization.
This seems false? All that needs be done is to formally define
F:={f:Rn→Rm∣f(x)=label(x)∀x∈Xtrain},
which is the set of functions which (when e.g. greedily sampled) perfectly label the (categorical) training data Xtrain, and we can parameterize such functions using the neural network parameter space. This yields a perfectly well-defined counting argument over F.
I myself expected you to respond to this post with some ML-specific reasoning about simplicity and measure of parameterizations, instead of your speculation about a relationship between the universal measure and inductive biases. I spoke with dozens of people about the ideas in OP’s post, and none of them mentioned arguments like the one you gave. I myself have spent years in the space and am also not familiar with this particular argument about bitstrings.
That probably would have been my objection had the reasoning about priors in this post been sound, but since the reasoning was unsound, I turned to the formalism to try to show why it’s unsound.
If these are your real reasons for expecting deceptive alignment, that’s fine, but I think you’ve mentioned this rather infrequently.
I think you’re misunderstanding the nature of my objection. It’s not that Solomonoff induction is my real reason for believing in deceptive alignment or something, it’s that the reasoning in this post is mathematically unsound, and I’m using the formalism to show why. If I weren’t responding to this post specifically, I probably wouldn’t have brought up Solomonoff induction at all.
This yields a perfectly well-defined counting argument over F.
we can parameterize such functions using the neural network parameter space
I’m very happy with running counting arguments over the actual neural network parameter space; the problem there is just that I don’t think we understand it well enough to do so effectively.
You could instead try to put a measure directly over the functions in your setup, but the problem there is that function space really isn’t the right space to run a counting argument like this; you need to be in algorithm space, otherwise you’ll do things like what happens in this post where you end up predicting overfitting rather than generalization (which implies that you’re using a prior that’s not suitable for running counting arguments on).
I’m very happy with running counting arguments over the actual neural network parameter space; the problem there is just that I don’t think we understand it well enough to do so effectively.
This is basically my position as well
The cited argument is a counting argument over the space of functions which achieve zero/low training loss.
You could instead try to put a measure directly over the functions in your setup, but the problem there is that function space really isn’t the right space to run a counting argument like this; you need to be in algorithm space, otherwise you’ll do things like what happens in this post where you end up predicting overfitting rather than generalization (which implies that you’re using a prior that’s not suitable for running counting arguments on).
Indeed, this is a crucial point that I think the post is trying to make. The cited counting arguments are counting functions instead of parameterizations. That’s the mistake (or, at least “a” mistake). I’m glad we agree it’s a mistake, but then I’m confused why you think that part of the post is unsound.
(Rereads)
Rereading the portion in question now, it seems that they changed it a lot since the draft. Personally, I think their argumentation is now weaker than it was before. The original argumentation clearly explained the mistake of counting functions instead of parameterizations, while the present post does not. It instead abstracts it as “an indifference principle”, where the reader has to do the work to realize that indifference over functions is inappropriate.
I’m sorry to hear that you think the argumentation is weaker now.
the reader has to do the work to realize that indifference over functions is inappropriate
I don’t think that indifference over functions in particular is inappropriate. I think indifference reasoning in general is inappropriate.
I’m very happy with running counting arguments over the actual neural network parameter space
I wouldn’t call the correct version of this a counting argument. The correct version uses the actual distribution used to initialize the parameters as a measure, and not e.g. the Lebesgue measure. This isn’t appealing to the indifference principle at all, and so in my book it’s not a counting argument. But this could be terminological.
I definitely appreciate how it can feel frustrating or bad when you feel that someone isn’t properly engaging with your ideas. However, I also feel frustrated by this statement. Your comment seems to have a tone of indignation that Quintin and Nora weren’t paying attention to what you wrote.
I myself expected you to respond to this post with some ML-specific reasoning about simplicity and measure of parameterizations, instead of your speculation about a relationship between the universal measure and inductive biases. I spoke with dozens of people about the ideas in OP’s post, and none of them mentioned arguments like the one you gave. I myself have spent years in the space and am also not familiar with this particular argument about bitstrings.
(EDIT: Having read Ryan’s comment, it now seems to me that you have exclusively made a simplicity argument without any counting involved, and an empirical claim about the relationship between description length of a mesa objective and the probability of SGD sampling a function which implements such an objective. Is this correct?)
If these are your real reasons for expecting deceptive alignment, that’s fine, but I think you’ve mentioned this rather infrequently. Your profile links to How likely is deceptive alignment?, which is an (introductory) presentation you gave. In that presentation, you make no mention of Turing machines, universal semimeasures, bitstrings, and so on. On a quick search, the closest you seem to come is the following:
But this is ambiguous (as can be expected for a presentation at this level). We could view this as “bitlength under a given decoding scheme, viewing an equivalence class over parameterizations as a set of possible messages” or “Shannon information (in bits) of a function induced by a given probability distribution over parameterizations” or something else entirely (perhaps having to do with infinite bitstrings).
My critique is not “this was ambiguous.” My critique is “how was anyone supposed to be aware of the ‘real’ argument which I (and many others) seem to now be encountering for the first time?”.
This seems false? All that needs be done is to formally define
F:={f:Rn→Rm∣f(x)=label(x)∀x∈Xtrain},which is the set of functions which (when e.g. greedily sampled) perfectly label the (categorical) training data Xtrain, and we can parameterize such functions using the neural network parameter space. This yields a perfectly well-defined counting argument over F.
This seems to be exactly the counting argument the post is critiquing, by the way.
That probably would have been my objection had the reasoning about priors in this post been sound, but since the reasoning was unsound, I turned to the formalism to try to show why it’s unsound.
I think you’re misunderstanding the nature of my objection. It’s not that Solomonoff induction is my real reason for believing in deceptive alignment or something, it’s that the reasoning in this post is mathematically unsound, and I’m using the formalism to show why. If I weren’t responding to this post specifically, I probably wouldn’t have brought up Solomonoff induction at all.
I’m very happy with running counting arguments over the actual neural network parameter space; the problem there is just that I don’t think we understand it well enough to do so effectively.
You could instead try to put a measure directly over the functions in your setup, but the problem there is that function space really isn’t the right space to run a counting argument like this; you need to be in algorithm space, otherwise you’ll do things like what happens in this post where you end up predicting overfitting rather than generalization (which implies that you’re using a prior that’s not suitable for running counting arguments on).
This is basically my position as well
The cited argument is a counting argument over the space of functions which achieve zero/low training loss.
Indeed, this is a crucial point that I think the post is trying to make. The cited counting arguments are counting functions instead of parameterizations. That’s the mistake (or, at least “a” mistake). I’m glad we agree it’s a mistake, but then I’m confused why you think that part of the post is unsound.
(Rereads)
Rereading the portion in question now, it seems that they changed it a lot since the draft. Personally, I think their argumentation is now weaker than it was before. The original argumentation clearly explained the mistake of counting functions instead of parameterizations, while the present post does not. It instead abstracts it as “an indifference principle”, where the reader has to do the work to realize that indifference over functions is inappropriate.
I’m sorry to hear that you think the argumentation is weaker now.
I don’t think that indifference over functions in particular is inappropriate. I think indifference reasoning in general is inappropriate.
I wouldn’t call the correct version of this a counting argument. The correct version uses the actual distribution used to initialize the parameters as a measure, and not e.g. the Lebesgue measure. This isn’t appealing to the indifference principle at all, and so in my book it’s not a counting argument. But this could be terminological.