Richard_Ngo comments on Against Almost Every Theory of Impact of Interpretability

Richard_Ngo Aug 18, 2023, 3:59 AM
5 points
6
What type of reasoning do you think would be most appropriate?
See the discussion between me and interstice upthread for a type of argument that feels more productive.
I would still argue that other research avenues are neglected in the community.
I agree (and mentioned so in my original comment). This post would have been far more productive if it had focused on exploring them.
We have to weigh the good against the bad, and I’d like to see some object-level explanations for why the bad doesn’t outweigh the good, and why the problem is sufficiently tractable.
The things you should be looking for, when it comes to fundamental breakthroughs, are deep problems demonstrating fascinating phenomena, and especially cases where you can get rapid feedback from reality. That’s what we’ve got here. If that’s not object-level enough then your criterion would have ruled out almost all great science in the past.
I think I agree, but this is only one of the many points in my post.
I wouldn’t have criticized it so strongly if you hadn’t listed it as “Perhaps the main problem I have with interp”.
- Charbel-Raphaël Aug 18, 2023, 5:54 AM
  11 points
  13
  Parent
  This post would have been far more productive if it had focused on exploring them.
  So the sections “Counteracting deception with only interp is not the only approach” and “Preventive measures against deception”, “Cognitive Emulations” and “Technical Agendas with better ToI” don’t feel productive? It seems to me that it’s already a good list of neglected research agendas. So I don’t understand.
  if you hadn’t listed it as “Perhaps the main problem I have with interp”
  In the above comment, I only agree with “we shouldn’t do useful work, because then it will encourage other people to do bad things”, and I don’t agree with your critique of “Perhaps the main problem I have with interp...” which I think is not justified enough.
  - Richard_Ngo Aug 18, 2023, 4:12 PM
    13 points
    10
    Parent
    So the sections “Counteracting deception with only interp is not the only approach” and “Preventive measures against deception”, “Cognitive Emulations” and “Technical Agendas with better ToI” don’t feel productive? It seems to me that it’s already a good list of neglected research agendas. So I don’t understand.
    You’ve listed them, but you haven’t really argued that they’re valuable, you’re mostly just asserting stuff like Rob Miles having a bigger impact than most interpretability researchers, or the best strategy being copying Dan Hendrycks. But since I disagree with the assertions, these sections aren’t very useful; they don’t actually zoom in on the positive case for these research directions.
    (The main positive case I’m seeing seems to be “anything which helps with coordination is really valuable”. And sure, coordination is great. But most coordination-related research is shallow: it helps us do things now, but doesn’t help us figure out how to do things better in the long term. So I think you’re overstating the case for it in general.)
    - Charbel-Raphaël Aug 18, 2023, 11:26 PM
      12 points
      10
      Parent
      I agree that I haven’t argued the positive case for more governance/coordination work (and that’s why I hope to do a next post on that).
      We do need alignment work, but I think the current allocation is too focused on alignment, whereas AI X-Risks could arrive in the near future. I’ll be happy to reinvest in alignment work once we’re sure we can avoid X-Risks from misuses and grossly negligent accidents.