afaict my critique remains valid. My criticism is precisely that counting arguments over function space aren’t generally well-defined, and even if they were they wouldn’t be the right way to run a counting argument.
Going back through the post, Nora+Quintin indeed made a specific and perfectly formalizable claim here:
These results strongly suggest that SGD is not doing anything like sampling uniformly at random from the set of representable functions that do well on the training set.
They’re making a perfectly valid point. The point was in the original post AFAICT—it wasn’t just only now explained by me. I agree that they could have presented it more clearly, but that’s a way different critique than you’re “using reasoning that doesn’t actually correspond to any well-defined mathematical object.”
regardless the point remains that the authors haven’t engaged with the sort of counting arguments that I actually think are valid.
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
If that’s truly the state of the evidence, then I’m happy to just conclude that Nora+Quintin are right, and update if/when actually valid arguments come along.
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
This point seems right to me: if the post is specifically about representable functions than that is a valid formalization AFAICT. (Though a extremely cursed formalization for reasons mentioned in a variety of places. And if you dropped “representable”, then it’s extremely, extremely cursed for various analysis related reasons, though I think there is still a theoretically sound uniform measure maybe???)
It would also be nice if the original post:
Clarified that the rebuttal is specifically about a version of the counting-argument which counts functions.
Noted that people making counting arguments weren’t intending to count functions, though this might be a common misconception about counting arguments. (Seems fine to also clarify that existing counting arguments are too hand wavy to really engage with if that’s the view also.) (See also here.)
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Question: When you say model space, you mean the functional behavior as opposed to the literal parameter space?
So there’s not quite a one to one mapping because there are multiple implementations of the exact same function in a network. But it’s pretty close. I mean, most of the time when I’m saying model space, I’m talking either about the weight space or about the function space where I’m interpreting the function over all inputs, not just the training data.
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.
From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
Personally, I don’t think there are “solid” counting arguments, but I think you can think though a bunch more cases and feel like the underlying intuition is at least somewhat reasonable.
Overall, I’m a simple man, I still like Joe’s report : ). Fair enough if you don’t find the arguments in here convincing. I think Joe’s report is pretty close to the SOTA with open mindedness and a bit of reinvention work to fill in various gaps.
Going back through the post, Nora+Quintin indeed made a specific and perfectly formalizable claim here:
They’re making a perfectly valid point. The point was in the original post AFAICT—it wasn’t just only now explained by me. I agree that they could have presented it more clearly, but that’s a way different critique than you’re “using reasoning that doesn’t actually correspond to any well-defined mathematical object.”
If that’s truly your remaining objection, then I think that you should retract the unmerited criticisms about how they’re trying to prove 0.9999… != 1 or whatever. In my opinion, you have confidently misrepresented their arguments, and the discussion would benefit from your revisions.
And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
If that’s truly the state of the evidence, then I’m happy to just conclude that Nora+Quintin are right, and update if/when actually valid arguments come along.
This point seems right to me: if the post is specifically about representable functions than that is a valid formalization AFAICT. (Though a extremely cursed formalization for reasons mentioned in a variety of places. And if you dropped “representable”, then it’s extremely, extremely cursed for various analysis related reasons, though I think there is still a theoretically sound uniform measure maybe???)
It would also be nice if the original post:
Clarified that the rebuttal is specifically about a version of the counting-argument which counts functions.
Noted that people making counting arguments weren’t intending to count functions, though this might be a common misconception about counting arguments. (Seems fine to also clarify that existing counting arguments are too hand wavy to really engage with if that’s the view also.) (See also here.)
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
But looking at a bunch of other LW posts, like Carlsmith’s report, a dialogue between Ronny Fernandez and Nate[1], Mark Xu talking about malignity of Solomonoff induction, Paul Christiano talking about NN priors, Evhub’s post on how likely is deceptive alignment etc[2]. I have concluded that:
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.
Personally, I don’t think there are “solid” counting arguments, but I think you can think though a bunch more cases and feel like the underlying intuition is at least somewhat reasonable.
Overall, I’m a simple man, I still like Joe’s report : ). Fair enough if you don’t find the arguments in here convincing. I think Joe’s report is pretty close to the SOTA with open mindedness and a bit of reinvention work to fill in various gaps.