While I agree that there are broader prior disagreements, I think that even if we isolate it to the question over whether Eliezer’s statement was correct, without baking in priors, the statement that we have no knowledge of AI because they’re inscrutable piles of numbers is verifiably wrong. To put it in Eliezer’s words, it’s a locally invalid argument, and this is known to to be false even without the broader prior disagreements.
One could honestly say the interpretability progress isn’t enough. One couldn’t honestly say that interpretability didn’t progress at all, or that we know nothing about AI internals at all without massive ignorance.
This is poor news for his epistemics, because note that this is a verifiably wrong statement that Eliezer keeps making without any caveats or limitations.
That’s a big problem because if Eliezer can be both making confidently locally invalid arguments on AI, and persistently makes that locally invalid argument, then it calls into question over how well his epistemics are working on AI, and from my perspective there are really only bad outcomes here.
It’s not that Eliezer’s wrong, it’s that he is persistently, confidently wrong about something that’s actually verifiable, such that we can point out the wrongness.
interpretability didn’t progress at all, or that we know nothing about AI internals at all
No to the former, yes to the latter—which is noteworthy because Eliezer only claimed the latter. That’s not a knock on interpretability research, when in fact Eliezer has repeatedly and publicly praised e.g. the work of Chis Olah and Distill. The choice to interpret the claim that we “know nothing about AI internals” as the claim that “no interpretability work has been done”, it should be pointed out, was a reading imposed by ShardPhoenix (and subsequently by you).
And in fact, we do still have approximately zero idea how large neural nets do what they do, interpretability research notwithstanding, as evinced by the fact that not a single person on this planet could code by hand whatever internal algorithms the models have learned. (The same is true of the brain, incidentally, which is why you sometimes hear people say “we have no idea how the brain works”, despite an insistently literal interpretation of this statement being falsified by the existence of neuroscience as a field.)
But it does, in fact, matter, whether the research into neural net interpretability translates to us knowing, in a real sense, what kind of work is going on inside large language models! That, ultimately, is the metric by which reality will judge us, not how many publications on interpretability were made (or how cool the results of said publications were—which, for the record, I think are very cool). And in light of that, I think it’s disingenuous to interpret Eliezer’s remark the way you and ShardPhoenix seem to be insisting on interpreting it in this thread.
And in fact, we do still have approximately zero idea how large neural nets do what they do, interpretability research notwithstanding, as evinced by the fact that not a single person on this planet could code by hand whatever internal algorithms the models have learned.
I now see where the problem lies. The basic issues I see with this argument are as follows:
The implied argument is if you can’t create something by yourself by hand in the field, you know nothing at all about what you are focusing on. This is straightforwardly not true for a lot of fields.
For example, I’d probably know quite a lot about borderlands 3, not perfectly, but I actually have quite a bit of knowledge, and I even could use save editors or cheatware with video tutorials, but under nearly 0 circumstances could I actually create borderlands 3 even if the game with it’s code already existed, even with a team.
This likely generalizes: while neuroscience has some knowledge of the brain, it’s not nearly at the point where it could reliably create a human brain from scratch, knowing some things about what cars do is not enough to create a working car, and so on.
In general, I think the error is that you and Eliezer have too high expectations of what some knowledge will bring you. It helps, but in virtually no cases will the knowledge alone allow you to create the thing you are focusing on.
It’s possible that our knowledge of the AI’s internal work isn’t enough, and that progress is too slow. I might agree or disagree, but at least this would be rational. Right now, I’m seeing basic locally invalid arguments here, and I notice that part of the problem is that you and Eliezer have too much of a binary view on knowledge, where you either have functionally perfect knowledge or no knowledge at all, but usually our knowledge is neither functionally perfect, nor is it zero knowledge.
Edit: This seems conceptually similar to P=NP, in that the problem is that verifying something and making something are conjectured to have very different difficulties, and essentially my claim is that verifying something isn’t equal to generating something.
I expect that if you sat down with him and had a one on one conversation, you’d find that he does have nuisances views. I also expect that Eliser realizes that there have been improvements in all of the areas you described. I think that the difference comes mostly down to “Has there been sufficient progress in interpretability to avert disaster?” I’m confident his answer would be “No.”
So, given that belief, and having a chance now and then to communicate with a wide audience, it is better to have a clear message, because you never know what will be a zeitgeist tipping point. It’s the fate of the world, so a little nuisance is just collateral damage.
I don’t know if that matters, because whether he’s pegged to Doom epistemically or strategically the result is the same.
While I agree that there are broader prior disagreements, I think that even if we isolate it to the question over whether Eliezer’s statement was correct, without baking in priors, the statement that we have no knowledge of AI because they’re inscrutable piles of numbers is verifiably wrong. To put it in Eliezer’s words, it’s a locally invalid argument, and this is known to to be false even without the broader prior disagreements.
One could honestly say the interpretability progress isn’t enough. One couldn’t honestly say that interpretability didn’t progress at all, or that we know nothing about AI internals at all without massive ignorance.
This is poor news for his epistemics, because note that this is a verifiably wrong statement that Eliezer keeps making without any caveats or limitations.
That’s a big problem because if Eliezer can be both making confidently locally invalid arguments on AI, and persistently makes that locally invalid argument, then it calls into question over how well his epistemics are working on AI, and from my perspective there are really only bad outcomes here.
It’s not that Eliezer’s wrong, it’s that he is persistently, confidently wrong about something that’s actually verifiable, such that we can point out the wrongness.
No to the former, yes to the latter—which is noteworthy because Eliezer only claimed the latter. That’s not a knock on interpretability research, when in fact Eliezer has repeatedly and publicly praised e.g. the work of Chis Olah and Distill. The choice to interpret the claim that we “know nothing about AI internals” as the claim that “no interpretability work has been done”, it should be pointed out, was a reading imposed by ShardPhoenix (and subsequently by you).
And in fact, we do still have approximately zero idea how large neural nets do what they do, interpretability research notwithstanding, as evinced by the fact that not a single person on this planet could code by hand whatever internal algorithms the models have learned. (The same is true of the brain, incidentally, which is why you sometimes hear people say “we have no idea how the brain works”, despite an insistently literal interpretation of this statement being falsified by the existence of neuroscience as a field.)
But it does, in fact, matter, whether the research into neural net interpretability translates to us knowing, in a real sense, what kind of work is going on inside large language models! That, ultimately, is the metric by which reality will judge us, not how many publications on interpretability were made (or how cool the results of said publications were—which, for the record, I think are very cool). And in light of that, I think it’s disingenuous to interpret Eliezer’s remark the way you and ShardPhoenix seem to be insisting on interpreting it in this thread.
I now see where the problem lies. The basic issues I see with this argument are as follows:
The implied argument is if you can’t create something by yourself by hand in the field, you know nothing at all about what you are focusing on. This is straightforwardly not true for a lot of fields.
For example, I’d probably know quite a lot about borderlands 3, not perfectly, but I actually have quite a bit of knowledge, and I even could use save editors or cheatware with video tutorials, but under nearly 0 circumstances could I actually create borderlands 3 even if the game with it’s code already existed, even with a team.
This likely generalizes: while neuroscience has some knowledge of the brain, it’s not nearly at the point where it could reliably create a human brain from scratch, knowing some things about what cars do is not enough to create a working car, and so on.
In general, I think the error is that you and Eliezer have too high expectations of what some knowledge will bring you. It helps, but in virtually no cases will the knowledge alone allow you to create the thing you are focusing on.
It’s possible that our knowledge of the AI’s internal work isn’t enough, and that progress is too slow. I might agree or disagree, but at least this would be rational. Right now, I’m seeing basic locally invalid arguments here, and I notice that part of the problem is that you and Eliezer have too much of a binary view on knowledge, where you either have functionally perfect knowledge or no knowledge at all, but usually our knowledge is neither functionally perfect, nor is it zero knowledge.
Edit: This seems conceptually similar to P=NP, in that the problem is that verifying something and making something are conjectured to have very different difficulties, and essentially my claim is that verifying something isn’t equal to generating something.
I expect that if you sat down with him and had a one on one conversation, you’d find that he does have nuisances views. I also expect that Eliser realizes that there have been improvements in all of the areas you described. I think that the difference comes mostly down to “Has there been sufficient progress in interpretability to avert disaster?” I’m confident his answer would be “No.”
So, given that belief, and having a chance now and then to communicate with a wide audience, it is better to have a clear message, because you never know what will be a zeitgeist tipping point. It’s the fate of the world, so a little nuisance is just collateral damage.
I don’t know if that matters, because whether he’s pegged to Doom epistemically or strategically the result is the same.