What makes you think that’s intended to be a counting argument over function space? I usually think of this as a counting argument over infinite bitstrings
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
The bitstring version of the argument, to the extent I can understand it, just seems even worse to me. You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent. The same goes for the circuit prior thing (although FWIW I think you’re very likely wrong that minimal circuits can be deceptive).
(Fair enough if you never read any of these comments.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
(I also think these written up arguments (Evan’s talk in particular) are very hand wavy, and just provide a vague intuition. So regardless of what he was intending, the actual words of the argument aren’t very solid IMO. Further, using words that rule out the intention of function space doesn’t necessarily imply there is an actually good model behind these words. To actually get anywhere with this reasoning, I think you’d have to reinvent the full argument and think through it in more detail yourself. I also think Evan is substantially wrong in practice though my current guess is that he isn’t too far off about the bottom line (maybe a factor of 3 off). I think Joe’s report is much better in that it’s very clear what level of abstraction and rigor it’s talking about. From reading this post, it doesn’t seem like you came into this project from the perspective of “is there an interesting recoverable intuition here, can we recover or generate a good argument” which would have been considerably better IMO.)
AFAICT Joe also thought this in his report
I think Joe was just operating from a much vaguer counting argument perspective based on my conversations with him about the report and his comments here. As in, he was just talking about the broadly construed counting-argument which can be applied to a wide range of possible inductive biases. As in, for any specific formal model of the situation, a counting-style argument will be somewhat applicable. (Though in practice, we might be able to have much more specific intuitions.)
Note that Joe and Evan have a very different perspective on the case for scheming.
(From my perspective, the correct intuition underlying the counting argument is something like “you only need to compute something which nearly exactly correlates with predicted reward once while you’ll need to compute many long range predictions to perform well in training”. See this comment for a more detailed discussion.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
Aren’t these arguments about simplicity, not counting?
Fair enough if you never read any of these comments.
Yeah, I never saw any of those comments. I think it’s obvious that the most natural reading of the counting argument is that it’s an argument over function space (specifically, over equivalence classes of functions which correspond to “goals.”) And I also think counting arguments for scheming over parameter space, or over Turing machines, or circuits, or whatever, are all much weaker. So from my perspective I’m attacking a steelman rather than a strawman.
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
Sorry about that—I wish you had been at the talk and could have asked a question about this.
You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent.
I agree that Solomonoff induction is obviously wrong in many ways, which is why you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically. But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
Do you agree that “instrumental convergence → meaningful evidence for doom” is also unsound, because it’s a counting argument that most functions of shape Y have undesirable property X?
I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
Evan admits that Solomonoff is a very poor guide to neural network inductive biases.
At this point, I’m not sure why you’re privileging the hypothesis of scheming at all.
you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
I’m going to stop responding to you now, because it seems that you are just not reading anything that I am saying. For the last time, my criticism has absolutely nothing to do with Solomonoff induction in particular, as I have now tried to explain to you here and here and here etc.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
Yes—that’s exactly the sort of counting argument that I like! Though note that it can be very hard to reason properly about counting arguments once you’re using a prior like that; it gets quite tricky to connect those sorts of low-level properties to high-level properties about stuff like deception.
I know that you think your criticism isn’t dependent on Solomonoff induction in particular, because you also claim that a counting argument goes through under circuit prior. It still seems like you view the Solomonoff case as the central one, because you keep talking about “bitstrings.” And I’ve repeatedly said that I don’t think the circuit prior works either, and why I think that.
At no point in this discussion have you provided any reason for thinking that in fact, the Solomonoff prior and/or circuit prior do provide non-negligible evidence about neural network inductive biases, despite the very obvious mechanistic disanalogies.
Yes—that’s exactly the sort of counting argument that I like!
Then make an NNGP counting argument! I have not seen such an argument anywhere. You seem to be alluding to unpublished, or at least little-known, arguments that did not make their way into Joe’s scheming report.
I definitely thought you were making a counting argument over function space, and AFAICT Joe also thought this in his report.
The bitstring version of the argument, to the extent I can understand it, just seems even worse to me. You’re making an argument about one type of learning procedure, Solomonoff induction, which is physically unrealizable and AFAICT has not even inspired any serious real-world approximations, and then assuming that somehow the conclusions will transfer over to a mechanistically very different learning procedure, gradient descent. The same goes for the circuit prior thing (although FWIW I think you’re very likely wrong that minimal circuits can be deceptive).
I’ve argued multiple times that Evan was not intending to make a counting argument in function space:
In discussion with Alex Turner (TurnTrout) when commenting on an earlier draft of this post.
In discussion with Quintin after sharing some comments on the draft. (Also shared with you TBC.)
In this earlier comment.
(Fair enough if you never read any of these comments.)
As I’ve noted in all of these comments, people consistently use terminology when making counting style arguments (except perhaps in Joe’s report) which rules out the person intending the argument to be about function space. (E.g., people say things like “bits” and “complexity in terms of the world model”.)
(I also think these written up arguments (Evan’s talk in particular) are very hand wavy, and just provide a vague intuition. So regardless of what he was intending, the actual words of the argument aren’t very solid IMO. Further, using words that rule out the intention of function space doesn’t necessarily imply there is an actually good model behind these words. To actually get anywhere with this reasoning, I think you’d have to reinvent the full argument and think through it in more detail yourself. I also think Evan is substantially wrong in practice though my current guess is that he isn’t too far off about the bottom line (maybe a factor of 3 off). I think Joe’s report is much better in that it’s very clear what level of abstraction and rigor it’s talking about. From reading this post, it doesn’t seem like you came into this project from the perspective of “is there an interesting recoverable intuition here, can we recover or generate a good argument” which would have been considerably better IMO.)
I think Joe was just operating from a much vaguer counting argument perspective based on my conversations with him about the report and his comments here. As in, he was just talking about the broadly construed counting-argument which can be applied to a wide range of possible inductive biases. As in, for any specific formal model of the situation, a counting-style argument will be somewhat applicable. (Though in practice, we might be able to have much more specific intuitions.)
Note that Joe and Evan have a very different perspective on the case for scheming.
(From my perspective, the correct intuition underlying the counting argument is something like “you only need to compute something which nearly exactly correlates with predicted reward once while you’ll need to compute many long range predictions to perform well in training”. See this comment for a more detailed discussion.)
Aren’t these arguments about simplicity, not counting?
Yeah, I never saw any of those comments. I think it’s obvious that the most natural reading of the counting argument is that it’s an argument over function space (specifically, over equivalence classes of functions which correspond to “goals.”) And I also think counting arguments for scheming over parameter space, or over Turing machines, or circuits, or whatever, are all much weaker. So from my perspective I’m attacking a steelman rather than a strawman.
Sorry about that—I wish you had been at the talk and could have asked a question about this.
I agree that Solomonoff induction is obviously wrong in many ways, which is why you want to substitute it out for whatever the prior is that you think is closest to deep learning that you can still reason about theoretically. But that should never lead you to do a counting argument over function space, since that is never a sound thing to do.
Do you agree that “instrumental convergence → meaningful evidence for doom” is also unsound, because it’s a counting argument that most functions of shape Y have undesirable property X?
I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.
So today we’ve learned that:
The real counting argument that Evan believes in is just a repackaging of Paul’s argument for the malignity of the Solomonoff prior, and not anything novel.
Evan admits that Solomonoff is a very poor guide to neural network inductive biases.
At this point, I’m not sure why you’re privileging the hypothesis of scheming at all.
I mean, the neural network Gaussian process is literally this, and you can make it more realistic by using the neural tangent kernel to simulate training dynamics, perhaps with some finite width corrections. There is real literature on this.
I’m going to stop responding to you now, because it seems that you are just not reading anything that I am saying. For the last time, my criticism has absolutely nothing to do with Solomonoff induction in particular, as I have now tried to explain to you here and here and here etc.
Yes—that’s exactly the sort of counting argument that I like! Though note that it can be very hard to reason properly about counting arguments once you’re using a prior like that; it gets quite tricky to connect those sorts of low-level properties to high-level properties about stuff like deception.
I’ve read every word of all of your comments.
I know that you think your criticism isn’t dependent on Solomonoff induction in particular, because you also claim that a counting argument goes through under circuit prior. It still seems like you view the Solomonoff case as the central one, because you keep talking about “bitstrings.” And I’ve repeatedly said that I don’t think the circuit prior works either, and why I think that.
At no point in this discussion have you provided any reason for thinking that in fact, the Solomonoff prior and/or circuit prior do provide non-negligible evidence about neural network inductive biases, despite the very obvious mechanistic disanalogies.
Then make an NNGP counting argument! I have not seen such an argument anywhere. You seem to be alluding to unpublished, or at least little-known, arguments that did not make their way into Joe’s scheming report.