TurnTrout comments on MIRI 2024 Communications Strategy

TurnTrout 31 May 2024 21:26 UTC
22 points
−16
“If your model of reality has the power to make these sweeping claims with high confidence, then you should almost certainly be able to use your model of reality to make novel predictions about the state of the world prior to AI doom that would help others determine if your model is correct.”
This is partially derivable from Bayes rule. In order for you to gain confidence in a theory, you need to make observations which are more likely in worlds where the theory is correct. Since MIRI seems to have grown even more confident in their models, they must’ve observed something which is more likely to be correct under their models. Therefore, to obey Conservation of Expected Evidence, the world could have come out a different way which would have decreased their confidence. So it was falsifiable this whole time. However, in my experience, MIRI-sympathetic folk deny this for some reason.
It’s simply not possible, as a matter of Bayesian reasoning, to lawfully update (today) based on empirical evidence (like LLMs succeeding) in order to change your probability of a hypothesis that “doesn’t make” any empirical predictions (today).
The fact that MIRI has yet to produce (to my knowledge) any major empirically validated predictions or important practical insights into the nature AI, or AI progress, in the last 20 years, undermines the idea that they have the type of special insight into AI that would allow them to express high confidence in a doom model like the one outlined in (4).
In summer 2022, Quintin Pope was explaining the results of the ROME paper to Eliezer. Eliezer impatiently interrupted him and said “so they found that facts were stored in the attention layers, so what?”. Of course, this was exactly wrong—Bau et al. found the circuits in mid-network MLPs. Yet, there was no visible moment of “oops” for Eliezer.
- gwern 31 May 2024 21:35 UTC
  54 points
  40
  Parent
  
  In summer 2022, Quintin Pope was explaining the results of the ROME paper to Eliezer. Eliezer impatiently interrupted him and said “so they found that facts were stored in the attention layers, so what?”. Of course, this was exactly wrong—Bau et al. found the circuits in mid-network MLPs. Yet, there was no visible moment of “oops” for Eliezer.
  
  I think I am missing context here. Why is that distinction between facts localized in attention layers and in MLP layers so earth-shaking Eliezer should have been shocked and awed by a quick guess during conversation being wrong, and is so revealing an anecdote you feel that it is the capstone of your comment, crystallizing everything wrong about Eliezer into a story?
  - TurnTrout 2 Jun 2024 6:11 UTC
    11 points
    13
    Parent
    ^ Aggressive strawman which ignores the main point of my comment. I didn’t say “earth-shaking” or “crystallizing everything wrong about Eliezer” or that the situation merited “shock and awe.” Additionally, the anecdote was unrelated to the other section of my comment, so I didn’t “feel” it was a “capstone.”
    I would have hoped, with all of the attention on this exchange, that someone would reply “hey, TurnTrout didn’t actually say that stuff.” You know, local validity and all that. I’m really not going to miss this site.
    Anyways, gwern, it’s pretty simple. The community edifies this guy and promotes his writing as a way to get better at careful reasoning. However, my actual experience is that Eliezer goes around doing things like e.g. impatiently interrupting people and being instantly wrong about it (importantly, in the realm of AI, as was the original context). This makes me think that Eliezer isn’t deploying careful reasoning to begin with.
    - gwern 4 Jun 2024 21:00 UTC
      13 points
      9
      Parent
      
      ^ Aggressive strawman which ignores the main point of my comment. I didn’t say “earth-shaking” or “crystallizing everything wrong about Eliezer” or that the situation merited “shock and awe.”
      
      I, uh, didn’t say you “say” either of those: I was sarcastically describing your comment about an anecdote that scarcely even seemed to illustrate what it was supposed to, much less was so important as to be worth recounting years later as a high profile story (surely you can come up with something better than that after all this time?), and did not put my description in quotes meant to imply literal quotation, like you just did right there. If we’re going to talk about strawmen...
      
      someone would reply “hey, TurnTrout didn’t actually say that stuff.”
      
      No one would say that or correct me for falsifying quotes, because I didn’t say you said that stuff. They might (and some do) disagree with my sarcastic description, but they certainly weren’t going to say ‘gwern, TurnTrout never actually used the phrase “shocked and awed” or the word “crystallizing”, how could you just make stuff up like that???’ …Because I didn’t. So it seems unfair to judge LW and talk about how you are “not going to miss this site”. (See what I did there? I am quoting you, which is why the text is in quotation marks, and if you didn’t write that in the comment I am responding to, someone is probably going to ask where the quote is from. But they won’t, because you did write that quote).
      
      You know, local validity and all that. I’m really not going to miss this site.
      
      In jumping to accusations of making up quotes and attacking an entire site for not immediately criticizing me in the way you are certain I should be criticized and saying that these failures illustrate why you are quitting it, might one say that you are being… overconfident?
      
      Additionally, the anecdote was unrelated to the other section of my comment, so I didn’t “feel” it was a “capstone.”
      
      Quite aside from it being in the same comment and so you felt it was related, it was obviously related to your first half about overconfidence in providing an anecdote of what you felt was overconfidence, and was rhetorically positioned at the end as the concrete Eliezer conclusion/illustration of the first half about abstract MIRI overconfidence. And you agree that that is what you are doing in your own description, that he “isn’t deploying careful reasoning” in the large things as well as the small, and you are presenting it as a small self-contained story illustrating that general overconfidence:
      
      However, my actual experience is that Eliezer goes around doing things like e.g. impatiently interrupting people and being instantly wrong about it (importantly, in the realm of AI, as was the original context). This makes me think that Eliezer isn’t deploying careful reasoning to begin with.
    - Amalthea 2 Jun 2024 6:46 UTC
      4 points
      1
      Parent
      That said, It also appears to me that Eliezer is probably not the most careful reasoner, and appears indeed often (perhaps egregiously) overconfident. That doesn’t mean one should begrudge people finding value in the sequences although it is certainly not ideal if people take them as mantras rather than useful pointers and explainers for basic things (I didn’t read them, so might have an incorrect view here). There does appear to be some tendency to just link to some point made in the sequences as some airtight thing, although I haven’t found it too pervasive recently.
    - Amalthea 2 Jun 2024 6:34 UTC
      2 points
      −2
      Parent
      You’re describing a situational character flaw which doesn’t really have any bearing on being able to reason carefully overall.
      - Thomas Kwa 2 Jun 2024 7:15 UTC
        10 points
        14
        Parent
        Disagree. Epistemics is a group project and impatiently interrupting people can make both you and your interlocutor less likely to combine your information into correct conclusions. It is also evidence that you’re incurious internally which makes you worse at reasoning, though I don’t want to speculate on Eliezer’s internal experience in particular.
        Amalthea 2 Jun 2024 8:01 UTC
        1 point
        0
        Parent
        I agree with the first sentence. I agree with the second sentence with the caveat that it’s not strong absolute evidence, but mostly applies to the given setting (which is exactly what I’m saying).
        
        People aren’t fixed entities and the quality of their contributions can vary over time and depend on context.
- Adam Jermyn 1 Jun 2024 1:37 UTC
  25 points
  12
  Parent
  One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.
  
  It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.
  - TurnTrout 2 Jun 2024 6:14 UTC
    11 points
    9
    Parent
    I agree, and I was thinking explicitly of that when I wrote “empirical” evidence and predictions in my original comment.
- TsviBT 2 Jun 2024 10:21 UTC
  11 points
  5
  Parent
  I personally have updated a fair amount over time on
  - people (going on) expressing invalid reasoning for their beliefs about timelines and alignment;
  - people (going on) expressing beliefs about timelines and alignment that seemed relatively more explicable via explanations other than “they have some good reason to believe this that I don’t know about”;
  - other people’s alignment hopes and mental strategies have more visible flaws and visible doomednesses;
  - other people mostly don’t seem to cumulatively integrate the doomednesses of their approaches into their mental landscape as guiding elements;
  - my own attempts to do so fail in a different way, namely that I’m too dumb to move effectively in the resulting modified landscape.
  We can back out predictions of my personal models from this, such as “we will continue to not have a clear theory of alignment” or “there will continue to be consensus views that aren’t supported by reasoning that’s solid enough that it ought to produce that consensus if everyone is being reasonable”.
- Lukas_Gloor 1 Jun 2024 14:28 UTC
  11 points
  9
  Parent
  I thought the first paragraph and the boldened bit of your comment seemed insightful. I don’t see why what you’re saying is wrong – it seems right to me (but I’m not sure).
  - habryka 1 Jun 2024 16:16 UTC
    7 points
    5
    Parent
    (I didn’t get anything out of it, and it seems kind of aggressive in a way that seems non-sequitur-ish, and also I am pretty sure mischaracterizes people. I didn’t downvote it, but have disagree-voted with it)