Rob Bensinger comments on Ngo and Yudkowsky on AI capability gains

Rob Bensinger Nov 21, 2021, 12:20 AM
LW: 5 AF: 3
AF
You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field!
I’ma guess that Eliezer thinks there’s a long list of sequences he could write meeting these conditions, each on a different topic.
- adamShimi Nov 21, 2021, 12:27 AM
  LW: 4 AF: 3
  AF Parent
  Good point, I hadn’t thought about that one.
  Still, I have to admit that my first reaction is that this particular sequence seems quite uniquely in a position to increase the quality of the debate and of alignment research singlehandedly. Of course, maybe I only feel that way because it’s the only one of the long list that I know of. ^^
  (Another possibility I just thought of is that maybe this subsequence requires a lot of new preliminary subsequences, such that the work is far larger than you could expect from reading the words “a subsequence”. Still sounds like it would be really valuable though.
  - Richard_Ngo Nov 21, 2021, 12:43 AM
    LW: 6 AF: 3
    AF Parent
    I don’t expect such a sequence to be particularly useful, compared with focusing on more object-level arguments. Eliezer says that the largest mistake he made in writing his original sequences was that he “didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory”. Better, I expect, to correct the specific mistakes alignment researchers are currently making, until people have enough data points to generalise better.
    - adamShimi Nov 21, 2021, 1:07 AM
      LW: 7 AF: 5
      AF Parent
      I’m honestly confused by this answer.
      Do you actually think that Yudkowsky having to correct everyone’s object-level mistakes all the time is strictly more productive and will lead faster to the meat of the deconfusion than trying to state the underlying form of the argument and theory, and then adapting it to the object-level arguments and comments?
      I have trouble understanding this, because for me the outcome of the first one is that no one gets it, he has to repeat himself all the time without making the debate progress, and this is one more giant hurdle for anyone trying to get into alignment and understand his position. It’s unclear whether the alternative would solve all these problems (as you quote from the preface of the Sequences, learning the theory is often easier and less useful than practicing), but it still sounds like a powerful accelerator.
      There is no dichotomy of “theory or practice”, we probably need both here. And based on my own experience reading the discussion posts and the discussions I’ve seen around these posts, the object-level refutations have not been particularly useful forms of practice, even if they’re better than nothing.
      - Richard_Ngo Nov 21, 2021, 9:48 PM
        LW: 26 AF: 11
        AF Parent
        Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn’t been tried. If anything, it’s the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn’t written anywhere near as extensively on object-level AI safety.
        This has been valuable for community-building, but less so for making intellectual progress—because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you’ve developed very good intuitions for how those problems work. In the case of alignment, it’s hard to learn things from grappling with most of these problems, because we don’t have signals of when we’re going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.
        By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn’t improve people’s object-level rationality very much.
        I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.) I expect that there’s just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren’t epistemologists making breakthroughs in all sorts of other domains?
        adamShimi Nov 22, 2021, 12:53 AM
        LW: 8 AF: 8
        AF Parent
        Thanks for giving more details about your perspective.
        Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn’t been tried. If anything, it’s the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn’t written anywhere near as extensively on object-level AI safety.
        It’s not clear to me that the sequences and HPMOR are good pointers for this particular approach to theory building. I mean, I’m sure there are posts in the sequences that touch on that (Einstein’s Arrogance is an example I already mentioned), but I expect that they only talk about it in passing and obliquely, and that such posts are spread all over the sequences. Plus the fact that Yudkowsky said that there was a new subsequence to write lead me to believe that he doesn’t think the information is clearly stated already.
        So I don’t think you can really put the current confusion as an evidence that the explanation of how that kind of theory would work doesn’t help, given that this isn’t readily available in a form I or anyone reading this can access AFAIK.
        This has been valuable for community-building, but less so for making intellectual progress—because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you’ve developed very good intuitions for how those problems work. In the case of alignment, it’s hard to learn things from grappling with most of these problems, because we don’t have signals of when we’re going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.
        Completely agree that these intuitions are important training data. But your whole point in other comments is that we want to understand why we should expect these intuitions to differ from apparently bad/useless analogies between AGI and other stuff. And some explanation of where these intuitions come from could help with evaluating these intuitions, even more because Yudkowsky has said that he could write a sequence about the process.
        By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn’t improve people’s object-level rationality very much.
        This sounds to me like a strawman of my position (which might be my fault for not explaining it well).
        First, I don’t think explaining a methodology is a “very high-level epistemological principle”, because it let us concretely pick apart and criticize the methodology as a truthfinding method.
        Second, the object-level work has already been done by Yudkowsky! I’m not saying that some outside-of-the-field epistemologist should ponder really hard about what would make sense for alignment without ever working on it concretely and then give us their teaching. Instead I’m pushing for a researcher who has built a coherent collections of intuitions and has thought about the epistemology of this process to share the latter to help us understand the former.
        A bit similar to my last point, I think the correct comparison here is not “philosophers of science outside the field helping the field”, which happens but is rare as you say, but “scientists thinking about epistemology for very practical reasons”. And given that the latter is from my understanding what started the scientific revolution and a common activity of all scientists until the big paradigms were established (in Physics and biology at least) in the early 20th century, I would say there is a good track record here.
        (Note that this is more your specialty, so I would appreciate evidence that I’m wrong in my historical interpretation here)
        I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.)
        Hum, I certainly like a lot of epistemic stuff, but I would say my tendencies to use epistemology are almost always grounded in concrete questions, like understanding why a given experiment tells us something relevant about what we’re studying.
        I also have to admit that I’m kind of confused, because I feel like you’re consistently using the sort of epistemic discussion that I’m advocating for when discussing predictions and what gives us confidence in a theory, and yet you don’t think it would be useful to have a similar-level model of the epistemology used by Yudkowsky to make the sort of judgment you’re investigating?
        I expect that there’s just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren’t epistemologists making breakthroughs in all sorts of other domains?
        As I wrote about, I don’t think this is a good prompt, because we’re talking about scientists using epistemology to make sense of their own work there.
        Here is an analogy I just thought of: I feel that in this discussion, you and Yudkowsky are talking about objects which have different types. So when you’re asking question about his model, there’s a type mismatch. And when he’s answering, having noticed the type mismatch, he’s trying to find what to ascribe it to (his answer has been quite consistently modest epistemology, which I think is clearly incorrect). Tracking the confusing does tell you some information about the type mismatch, and is probably part of the process to resolve it. But having his best description of his type (given that your type is quite standardized) would make this process far faster, by helping you triangulate the differences.