I think you should have asked for clarification before making blistering critiques about how Nora “ended up using reasoning that doesn’t actually correspond to any well-defined mathematical object.” I think your comments paint a highly uncharitable and (more importantly) incorrect view of N/Q’s claims.
I’m happy to apologize if I misinterpreted anyone, but afaict my critique remains valid. My criticism is precisely that counting arguments over function space aren’t generally well-defined, and even if they were they wouldn’t be the right way to run a counting argument. So my criticism that the original post misunderstands how to properly run a counting argument still seems correct to me. Perhaps you could say that it’s not the authors’ fault, that they were responding to weak arguments that other people were actually making, but regardless the point remains that the authors haven’t engaged with the sort of counting arguments that I actually think are valid.
Your presentations often include a counting argument over a function space, in the form of “saints” versus “schemers” and “sycophants.” So it seems to me that you do suggest that. What am I missing?
What makes you think that’s intended to be a counting argument over function space? I usually think of this as a counting argument over infinite bitstrings, as I noted in my comment (though there are many other valid presentations). It’s possible I said something in that talk that gave a misleading impression there, but I certainly don’t believe and have never believed in any counting arguments over function space.
afaict my critique remains valid. My criticism is precisely that counting arguments over function space aren’t generally well-defined, and even if they were they wouldn’t be the right way to run a counting argument.
Going back through the post, Nora+Quintin indeed made a specific and perfectly formalizable claim here:
These results strongly suggest that SGD is not doing anything like sampling uniformly at random from the set of representable functions that do well on the training set.
They’re making a perfectly valid point. The point was in the original post AFAICT—it wasn’t just only now explained by me. I agree that they could have presented it more clearly, but that’s a way different critique than you’re “using reasoning that doesn’t actually correspond to any well-defined mathematical object.”
regardless the point remains that the authors haven’t engaged with the sort of counting arguments that I actually think are valid.
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
If that’s truly the state of the evidence, then I’m happy to just conclude that Nora+Quintin are right, and update if/when actually valid arguments come along.
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
This point seems right to me: if the post is specifically about representable functions than that is a valid formalization AFAICT. (Though a extremely cursed formalization for reasons mentioned in a variety of places. And if you dropped “representable”, then it’s extremely, extremely cursed for various analysis related reasons, though I think there is still a theoretically sound uniform measure maybe???)
It would also be nice if the original post:
Clarified that the rebuttal is specifically about a version of the counting-argument which counts functions.
Noted that people making counting arguments weren’t intending to count functions, though this might be a common misconception about counting arguments. (Seems fine to also clarify that existing counting arguments are too hand wavy to really engage with if that’s the view also.) (See also here.)
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Question: When you say model space, you mean the functional behavior as opposed to the literal parameter space?
So there’s not quite a one to one mapping because there are multiple implementations of the exact same function in a network. But it’s pretty close. I mean, most of the time when I’m saying model space, I’m talking either about the weight space or about the function space where I’m interpreting the function over all inputs, not just the training data.
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.
From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
Personally, I don’t think there are “solid” counting arguments, but I think you can think though a bunch more cases and feel like the underlying intuition is at least somewhat reasonable.
Overall, I’m a simple man, I still like Joe’s report : ). Fair enough if you don’t find the arguments in here convincing. I think Joe’s report is pretty close to the SOTA with open mindedness and a bit of reinvention work to fill in various gaps.
What makes you think that’s intended to be a counting argument over function space? I usually think of this as a counting argument over infinite bitstrings
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
The bitstring version of the argument, to the extent I can understand it, just seems even worse to me. You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent. The same goes for the circuit prior thing (although FWIW I think you’re very likely wrong that minimal circuits can be deceptive).
(Fair enough if you never read any of these comments.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
(I also think these written up arguments (Evan’s talk in particular) are very hand wavy, and just provide a vague intuition. So regardless of what he was intending, the actual words of the argument aren’t very solid IMO. Further, using words that rule out the intention of function space doesn’t necessarily imply there is an actually good model behind these words. To actually get anywhere with this reasoning, I think you’d have to reinvent the full argument and think through it in more detail yourself. I also think Evan is substantially wrong in practice though my current guess is that he isn’t too far off about the bottom line (maybe a factor of 3 off). I think Joe’s report is much better in that it’s very clear what level of abstraction and rigor it’s talking about. From reading this post, it doesn’t seem like you came into this project from the perspective of “is there an interesting recoverable intuition here, can we recover or generate a good argument” which would have been considerably better IMO.)
AFAICT Joe also thought this in his report
I think Joe was just operating from a much vaguer counting argument perspective based on my conversations with him about the report and his comments here. As in, he was just talking about the broadly construed counting-argument which can be applied to a wide range of possible inductive biases. As in, for any specific formal model of the situation, a counting-style argument will be somewhat applicable. (Though in practice, we might be able to have much more specific intuitions.)
Note that Joe and Evan have a very different perspective on the case for scheming.
(From my perspective, the correct intuition underlying the counting argument is something like “you only need to compute something which nearly exactly correlates with predicted reward once while you’ll need to compute many long range predictions to perform well in training”. See this comment for a more detailed discussion.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
Aren’t these arguments about simplicity, not counting?
Fair enough if you never read any of these comments.
Yeah, I never saw any of those comments. I think it’s obvious that the most natural reading of the counting argument is that it’s an argument over function space (specifically, over equivalence classes of functions which correspond to “goals.”) And I also think counting arguments for scheming over parameter space, or over Turing machines, or circuits, or whatever, are all much weaker. So from my perspective I’m attacking a steelman rather than a strawman.
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
Sorry about that—I wish you had been at the talk and could have asked a question about this.
You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent.
I agree that Solomonoff induction is obviously wrong in many ways, which is why you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically. But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
Do you agree that “instrumental convergence → meaningful evidence for doom” is also unsound, because it’s a counting argument that most functions of shape Y have undesirable property X?
I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
Evan admits that Solomonoff is a very poor guide to neural network inductive biases.
At this point, I’m not sure why you’re privileging the hypothesis of scheming at all.
you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
I’m going to stop responding to you now, because it seems that you are just not reading anything that I am saying. For the last time, my criticism has absolutely nothing to do with Solomonoff induction in particular, as I have now tried to explain to you here and here and here etc.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
Yes—that’s exactly the sort of counting argument that I like! Though note that it can be very hard to reason properly about counting arguments once you’re using a prior like that; it gets quite tricky to connect those sorts of low-level properties to high-level properties about stuff like deception.
I know that you think your criticism isn’t dependent on Solomonoff induction in particular, because you also claim that a counting argument goes through under circuit prior. It still seems like you view the Solomonoff case as the central one, because you keep talking about “bitstrings.” And I’ve repeatedly said that I don’t think the circuit prior works either, and why I think that.
At no point in this discussion have you provided any reason for thinking that in fact, the Solomonoff prior and/or circuit prior do provide non-negligible evidence about neural network inductive biases, despite the very obvious mechanistic disanalogies.
Yes—that’s exactly the sort of counting argument that I like!
Then make an NNGP counting argument! I have not seen such an argument anywhere. You seem to be alluding to unpublished, or at least little-known, arguments that did not make their way into Joe’s scheming report.
I’m happy to apologize if I misinterpreted anyone, but afaict my critique remains valid. My criticism is precisely that counting arguments over function space aren’t generally well-defined, and even if they were they wouldn’t be the right way to run a counting argument. So my criticism that the original post misunderstands how to properly run a counting argument still seems correct to me. Perhaps you could say that it’s not the authors’ fault, that they were responding to weak arguments that other people were actually making, but regardless the point remains that the authors haven’t engaged with the sort of counting arguments that I actually think are valid.
What makes you think that’s intended to be a counting argument over function space? I usually think of this as a counting argument over infinite bitstrings, as I noted in my comment (though there are many other valid presentations). It’s possible I said something in that talk that gave a misleading impression there, but I certainly don’t believe and have never believed in any counting arguments over function space.
Going back through the post, Nora+Quintin indeed made a specific and perfectly formalizable claim here:
They’re making a perfectly valid point. The point was in the original post AFAICT—it wasn’t just only now explained by me. I agree that they could have presented it more clearly, but that’s a way different critique than you’re “using reasoning that doesn’t actually correspond to any well-defined mathematical object.”
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
If that’s truly the state of the evidence, then I’m happy to just conclude that Nora+Quintin are right, and update if/when actually valid arguments come along.
This point seems right to me: if the post is specifically about representable functions than that is a valid formalization AFAICT. (Though a extremely cursed formalization for reasons mentioned in a variety of places. And if you dropped “representable”, then it’s extremely, extremely cursed for various analysis related reasons, though I think there is still a theoretically sound uniform measure maybe???)
It would also be nice if the original post:
Clarified that the rebuttal is specifically about a version of the counting-argument which counts functions.
Noted that people making counting arguments weren’t intending to count functions, though this might be a common misconception about counting arguments. (Seems fine to also clarify that existing counting arguments are too hand wavy to really engage with if that’s the view also.) (See also here.)
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
But looking at a bunch of other LW posts, like Carlsmith’s report, a dialogue between Ronny Fernandez and Nate[1], Mark Xu talking about malignity of Solomonoff induction, Paul Christiano talking about NN priors, Evhub’s post on how likely is deceptive alignment etc[2]. I have concluded that:
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.
Personally, I don’t think there are “solid” counting arguments, but I think you can think though a bunch more cases and feel like the underlying intuition is at least somewhat reasonable.
Overall, I’m a simple man, I still like Joe’s report : ). Fair enough if you don’t find the arguments in here convincing. I think Joe’s report is pretty close to the SOTA with open mindedness and a bit of reinvention work to fill in various gaps.
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
The bitstring version of the argument, to the extent I can understand it, just seems even worse to me. You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent. The same goes for the circuit prior thing (although FWIW I think you’re very likely wrong that minimal circuits can be deceptive).
I’ve argued multiple times that Evan was not intending to make a counting argument in function space:
In discussion with Alex Turner (TurnTrout) when commenting on an earlier draft of this post.
In discussion with Quintin after sharing some comments on the draft. (Also shared with you TBC.)
In this earlier comment.
(Fair enough if you never read any of these comments.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
(I also think these written up arguments (Evan’s talk in particular) are very hand wavy, and just provide a vague intuition. So regardless of what he was intending, the actual words of the argument aren’t very solid IMO. Further, using words that rule out the intention of function space doesn’t necessarily imply there is an actually good model behind these words. To actually get anywhere with this reasoning, I think you’d have to reinvent the full argument and think through it in more detail yourself. I also think Evan is substantially wrong in practice though my current guess is that he isn’t too far off about the bottom line (maybe a factor of 3 off). I think Joe’s report is much better in that it’s very clear what level of abstraction and rigor it’s talking about. From reading this post, it doesn’t seem like you came into this project from the perspective of “is there an interesting recoverable intuition here, can we recover or generate a good argument” which would have been considerably better IMO.)
I think Joe was just operating from a much vaguer counting argument perspective based on my conversations with him about the report and his comments here. As in, he was just talking about the broadly construed counting-argument which can be applied to a wide range of possible inductive biases. As in, for any specific formal model of the situation, a counting-style argument will be somewhat applicable. (Though in practice, we might be able to have much more specific intuitions.)
Note that Joe and Evan have a very different perspective on the case for scheming.
(From my perspective, the correct intuition underlying the counting argument is something like “you only need to compute something which nearly exactly correlates with predicted reward once while you’ll need to compute many long range predictions to perform well in training”. See this comment for a more detailed discussion.)
Aren’t these arguments about simplicity, not counting?
Yeah, I never saw any of those comments. I think it’s obvious that the most natural reading of the counting argument is that it’s an argument over function space (specifically, over equivalence classes of functions which correspond to “goals.”) And I also think counting arguments for scheming over parameter space, or over Turing machines, or circuits, or whatever, are all much weaker. So from my perspective I’m attacking a steelman rather than a strawman.
Sorry about that—I wish you had been at the talk and could have asked a question about this.
I agree that Solomonoff induction is obviously wrong in many ways, which is why you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically. But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
Do you agree that “instrumental convergence → meaningful evidence for doom” is also unsound, because it’s a counting argument that most functions of shape Y have undesirable property X?
I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.
So today we’ve learned that:
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
Evan admits that Solomonoff is a very poor guide to neural network inductive biases.
At this point, I’m not sure why you’re privileging the hypothesis of scheming at all.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
I’m going to stop responding to you now, because it seems that you are just not reading anything that I am saying. For the last time, my criticism has absolutely nothing to do with Solomonoff induction in particular, as I have now tried to explain to you here and here and here etc.
Yes—that’s exactly the sort of counting argument that I like! Though note that it can be very hard to reason properly about counting arguments once you’re using a prior like that; it gets quite tricky to connect those sorts of low-level properties to high-level properties about stuff like deception.
I’ve read every word of all of your comments.
I know that you think your criticism isn’t dependent on Solomonoff induction in particular, because you also claim that a counting argument goes through under circuit prior. It still seems like you view the Solomonoff case as the central one, because you keep talking about “bitstrings.” And I’ve repeatedly said that I don’t think the circuit prior works either, and why I think that.
At no point in this discussion have you provided any reason for thinking that in fact, the Solomonoff prior and/or circuit prior do provide non-negligible evidence about neural network inductive biases, despite the very obvious mechanistic disanalogies.
Then make an NNGP counting argument! I have not seen such an argument anywhere. You seem to be alluding to unpublished, or at least little-known, arguments that did not make their way into Joe’s scheming report.