And then it’d be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it’s very frustrating to hear that there (apparently) are valid counting arguments but also they aren’t the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren’t linkable.)
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Question: When you say model space, you mean the functional behavior as opposed to the literal parameter space?
So there’s not quite a one to one mapping because there are multiple implementations of the exact same function in a network. But it’s pretty close. I mean, most of the time when I’m saying model space, I’m talking either about the weight space or about the function space where I’m interpreting the function over all inputs, not just the training data.
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.
Isn’t Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations?
But looking at a bunch of other LW posts, like Carlsmith’s report, a dialogue between Ronny Fernandez and Nate[1], Mark Xu talking about malignity of Solomonoff induction, Paul Christiano talking about NN priors, Evhub’s post on how likely is deceptive alignment etc[2]. I have concluded that:
A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms.
The arguments individual people make for scheming, including those that may fit the name “counting arguments”, seem to differ greatly. Which is basically the norm in alignment.
Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they’re weak in various ways and his “intuitive” counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC’s arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora’s claim. I am unsure if Paul Christiano’s arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can’t find the video.
I think alignment has historically had poor feedback loops, though IMO they’ve improved somewhat in the last few years, and this conceals peoples’ wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don’t seem to grok the other’s models. I’d bet that this is much more common than people think.
In fact, I think this whole discussion is an example of this.
This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.
I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:
Though it is also possible that he’s been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing.
Where is the argument? If you run the counting argument in function space, it’s at least clear why you might think there are “more” schemers than saints. But if you’re going to say there are “more” params that correspond to scheming than there are saint-params, that looks like a substantive empirical claim that could easily turn out to be false.