It seems that all parties including Nora agree with “If a LLM outputs A rather than B, and you ask me why, then it might take me decades of work to give you a reasonable & intuitive answer”. The disagreements are (1) whether we should care—i.e., whether this fact is important and worrisome in the context of safe & beneficial AGI, (2) what the terms “black box” and “white box” mean.
I think Nora’s comment here was taking an opportunity to argue her side of (1).
In Nora’s recent post, to her credit, she defined exactly what she meant by “white box” the first time she used the term, and her discussion was valid given that definition.
I think her recent post (and ditto the OP here) would have been clearer if she had (A) noted that people in the AGI safety space sometimes use “black box” to say something like the “decades of work” claim above, (B) explicitly said that the “decades of work” claim is obviously true and totally uncontroversial, (C) clarified that this popular definition of “black box / white box” is not the definition she’s using in this post.
(A similar suggestion also applies to the other side of the debate including me, i.e. in the unlikely event that I use the term “black box” to mean the “decades of work” thing, in my future writing, I plan to immediately define it and also explicitly say that I’m not using the term to discuss whether or not you can see the weights and perform SGD.)
Hmm, I guess the point of using the term “white box” is then to illustrate that it is not a literal black box, while the point of the term “black box” is that while it’s a literal transparent system, we still don’t understand it in the ways that matter. There’s something that feels really off about the dynamic of term use here, but I can’t quite articulate it.
The terms “white box” and “black box”, like pretty much all terms, are more than just their literal definitions, they are also trojan horses full of connotations and vibes. So of course it’s natural (albeit unfortunate and annoying) for people on both sides of a debate to try to get those connotations and vibes to work in service of their side. :-P
I’ll edit the post soon to focus on the fact that the white-box definition is not a standard definition of the term, and instead refers to the computer analysis/security sense of the term.
It seems that all parties including Nora agree with “If a LLM outputs A rather than B, and you ask me why, then it might take me decades of work to give you a reasonable & intuitive answer”. The disagreements are (1) whether we should care—i.e., whether this fact is important and worrisome in the context of safe & beneficial AGI, (2) what the terms “black box” and “white box” mean.
I think Nora’s comment here was taking an opportunity to argue her side of (1).
In Nora’s recent post, to her credit, she defined exactly what she meant by “white box” the first time she used the term, and her discussion was valid given that definition.
I think her recent post (and ditto the OP here) would have been clearer if she had (A) noted that people in the AGI safety space sometimes use “black box” to say something like the “decades of work” claim above, (B) explicitly said that the “decades of work” claim is obviously true and totally uncontroversial, (C) clarified that this popular definition of “black box / white box” is not the definition she’s using in this post.
(A similar suggestion also applies to the other side of the debate including me, i.e. in the unlikely event that I use the term “black box” to mean the “decades of work” thing, in my future writing, I plan to immediately define it and also explicitly say that I’m not using the term to discuss whether or not you can see the weights and perform SGD.)
Hmm, I guess the point of using the term “white box” is then to illustrate that it is not a literal black box, while the point of the term “black box” is that while it’s a literal transparent system, we still don’t understand it in the ways that matter. There’s something that feels really off about the dynamic of term use here, but I can’t quite articulate it.
The terms “white box” and “black box”, like pretty much all terms, are more than just their literal definitions, they are also trojan horses full of connotations and vibes. So of course it’s natural (albeit unfortunate and annoying) for people on both sides of a debate to try to get those connotations and vibes to work in service of their side. :-P
I’ll edit the post soon to focus on the fact that the white-box definition is not a standard definition of the term, and instead refers to the computer analysis/security sense of the term.