I obviously don’t think the counting argument for overfitting is actually sound, that’s the whole point. But I think the counting argument for scheming is just as obviously invalid, and misuses formalisms just as egregiously, if not moreso.
I deny that your Kolmogorov framework is anything like “the proper formalism” for neural networks. I also deny that the counting argument for overfitting is appropriately characterized as a “finite bitstring” argument, because that suggests I’m talking about Turing machine programs of finite length, which I’m not- I’m directly enumerating functions over a subset of the natural numbers. Are you saying the set of functions over 1...10,000 is not a well defined mathematical object?
I obviously don’t think the counting argument for overfitting is actually sound, that’s the whole point.
Yes, I’m well aware. The problem is that when you make the counting argument for overfitting, you do so in a way that seriously misuses the formalism, which is why the argument fails. So you can’t draw any lessons about counting arguments for deception from the failure of your counting argument for overfitting.
But I think the counting argument for scheming is just as obviously invalid, and misuses formalisms just as egregiously, if not moreso.
Then show me how! If you think there are errors in the math, please point them out.
Of course, it’s worth stating that I certainly don’t have some sort of airtight mathematical argument proving that deception is likely in neural networks—there are lots of assumptions there that could very well be wrong. But I do think that the basic style of reasoning employed by such arguments is sound.
I deny that your Kolmogorov framework is anything like “the proper formalism” for neural networks.
Err… I’m using K-complexity here because it’s a simple framework to reason about, but my criticism isn’t “you should use K-complexity to reason about neural networks.” I think K-complexity captures some important facts about neural network generalization, but is clearly egregiously wrong in other areas. But there are lots of other formalisms! My criticism isn’t that you should use K-complexity, it’s that you should use any formalism at all.
The basic criticism is that the reasoning you use in the post doesn’t correspond to any formalism at all; it’s self-contradictory and inconsistent. So by all means you should replace K-complexity with something better (that’s what I usually try to do as well) but you still need to be reasoning in a way that’s mathematically consistent.
I also deny that the counting argument for overfitting is appropriately characterized as a “finite bitstring” argument, because that suggests I’m talking about Turing machine programs of finite length, which I’m not- I’m directly enumerating functions over a subset of the natural numbers.
One person’s modus ponens is another’s modus tollens. If you say you have a formalism, and that formalism predicts overfitting rather than generalization, then my first objection to your formalism is that it’s clearly a bad formalism for understanding neural networks in practice. Maybe the most basic thing that any good formalism here should get right is that it should predict generalization; if your formalism doesn’t, then it’s clearly not a good formalism.
Then show me how! If you think there are errors in the math, please point them out.
I’m not aware of any actual math behind the counting argument for scheming. I’ve only ever seen handwavy informal arguments about the number of Christs vs Martin Luthers vs Blaise Pascals. There certainly was no formal argument presented in Joe’s extensive scheming report, which I assumed would be sufficient context for writing this essay.
Well, I presented a very simple formulation in my comment, so that could be a reasonable starting point.
But I agree that unfortunately there hasn’t been that much good formal analysis here that’s been written up. At least on my end, that’s for two reasons:
Most of the formal analysis of this form that I’ve published (e.g. this and this) has been focused on sycophancy (human imitator vs. direct translator) rather than deceptive alignment, as sycophancy is a substantially more tractable problem. Finding a prior that reasonably rules out deceptive alignment seems quite out of reach to me currently; at one point I thought a circuit prior might do it, but I now think that circuit priors don’t get rid of deceptive alignment.
I’m currently more optimistic about empirical evidence rather than theoretical evidence for resolving this question, which is why I’ve been focusing on projects such as Sleeper Agents.
Right, and I’ve explained why I don’t think any of those analyses are relevant to neural networks. Deep learning simply does not search over Turing machines or circuits of varying lengths. It searches over parameters of an arithmetic circuit of fixed structure, size, and runtime. So Solomonoff induction, speed priors, and circuit priors are all inapplicable. There has been a lot of work in the mainstream science of deep learning literature on the generalization behavior of actual neural nets, and I’m pretty baffled at why you don’t pay more attention to that stuff.
Right, and I’ve explained why I don’t think any of those analyses are relevant to neural networks. Deep learning simply does not search over Turing machines or circuits of varying lengths. It searches over parameters of an arithmetic circuit of fixed structure, size, and runtime. So Solomonoff induction, speed priors, and circuit priors are all inapplicable.
It is trivially easy to modify the formalism to search only over fixed-size algorithms, and in fact that’s usually what I do when I run this sort of analysis. I feel like you still aren’t understanding the key criticism here—it’s really not about Solomonoff induction—and I’m not sure how to explain that in any way other than how I’ve already done so.
There has been a lot of work in the mainstream science of deep learning literature on the generalization behavior of actual neural nets, and I’m pretty baffled at why you don’t pay more attention to that stuff.
I’m going to assume you just aren’t very familiar with my writing, because working through empirical evidence about neural network inductive biases is somethingI loveto doall thetime.
It is trivially easy to modify the formalism to search only over fixed-size algorithms, and in fact that’s usually what I do when I run this sort of analysis.
What? Which formalism? I don’t see how this is true at all. Please elaborate or send an example of “modifying” Solomonoff so that all the programs have fixed length, or “modifying” the circuit prior so all circuits are the same size.
No, I’m pretty familiar with your writing. I still don’t think you’re focusing on mainstream ML literature enough because you’re still putting nonzero weight on these other irrelevant formalisms. Taking that literature seriously would mean ceasing to take the Solomonoff or circuit prior literature seriously.
I obviously don’t think the counting argument for overfitting is actually sound, that’s the whole point. But I think the counting argument for scheming is just as obviously invalid, and misuses formalisms just as egregiously, if not moreso.
I deny that your Kolmogorov framework is anything like “the proper formalism” for neural networks. I also deny that the counting argument for overfitting is appropriately characterized as a “finite bitstring” argument, because that suggests I’m talking about Turing machine programs of finite length, which I’m not- I’m directly enumerating functions over a subset of the natural numbers. Are you saying the set of functions over 1...10,000 is not a well defined mathematical object?
Yes, I’m well aware. The problem is that when you make the counting argument for overfitting, you do so in a way that seriously misuses the formalism, which is why the argument fails. So you can’t draw any lessons about counting arguments for deception from the failure of your counting argument for overfitting.
Then show me how! If you think there are errors in the math, please point them out.
Of course, it’s worth stating that I certainly don’t have some sort of airtight mathematical argument proving that deception is likely in neural networks—there are lots of assumptions there that could very well be wrong. But I do think that the basic style of reasoning employed by such arguments is sound.
Err… I’m using K-complexity here because it’s a simple framework to reason about, but my criticism isn’t “you should use K-complexity to reason about neural networks.” I think K-complexity captures some important facts about neural network generalization, but is clearly egregiously wrong in other areas. But there are lots of other formalisms! My criticism isn’t that you should use K-complexity, it’s that you should use any formalism at all.
The basic criticism is that the reasoning you use in the post doesn’t correspond to any formalism at all; it’s self-contradictory and inconsistent. So by all means you should replace K-complexity with something better (that’s what I usually try to do as well) but you still need to be reasoning in a way that’s mathematically consistent.
One person’s modus ponens is another’s modus tollens. If you say you have a formalism, and that formalism predicts overfitting rather than generalization, then my first objection to your formalism is that it’s clearly a bad formalism for understanding neural networks in practice. Maybe the most basic thing that any good formalism here should get right is that it should predict generalization; if your formalism doesn’t, then it’s clearly not a good formalism.
I’m not aware of any actual math behind the counting argument for scheming. I’ve only ever seen handwavy informal arguments about the number of Christs vs Martin Luthers vs Blaise Pascals. There certainly was no formal argument presented in Joe’s extensive scheming report, which I assumed would be sufficient context for writing this essay.
Well, I presented a very simple formulation in my comment, so that could be a reasonable starting point.
But I agree that unfortunately there hasn’t been that much good formal analysis here that’s been written up. At least on my end, that’s for two reasons:
Most of the formal analysis of this form that I’ve published (e.g. this and this) has been focused on sycophancy (human imitator vs. direct translator) rather than deceptive alignment, as sycophancy is a substantially more tractable problem. Finding a prior that reasonably rules out deceptive alignment seems quite out of reach to me currently; at one point I thought a circuit prior might do it, but I now think that circuit priors don’t get rid of deceptive alignment.
I’m currently more optimistic about empirical evidence rather than theoretical evidence for resolving this question, which is why I’ve been focusing on projects such as Sleeper Agents.
Right, and I’ve explained why I don’t think any of those analyses are relevant to neural networks. Deep learning simply does not search over Turing machines or circuits of varying lengths. It searches over parameters of an arithmetic circuit of fixed structure, size, and runtime. So Solomonoff induction, speed priors, and circuit priors are all inapplicable. There has been a lot of work in the mainstream science of deep learning literature on the generalization behavior of actual neural nets, and I’m pretty baffled at why you don’t pay more attention to that stuff.
It is trivially easy to modify the formalism to search only over fixed-size algorithms, and in fact that’s usually what I do when I run this sort of analysis. I feel like you still aren’t understanding the key criticism here—it’s really not about Solomonoff induction—and I’m not sure how to explain that in any way other than how I’ve already done so.
I’m going to assume you just aren’t very familiar with my writing, because working through empirical evidence about neural network inductive biases is something I love to do all the time.
What? Which formalism? I don’t see how this is true at all. Please elaborate or send an example of “modifying” Solomonoff so that all the programs have fixed length, or “modifying” the circuit prior so all circuits are the same size.
No, I’m pretty familiar with your writing. I still don’t think you’re focusing on mainstream ML literature enough because you’re still putting nonzero weight on these other irrelevant formalisms. Taking that literature seriously would mean ceasing to take the Solomonoff or circuit prior literature seriously.