Joe Collman comments on Review AI Alignment posts to help figure out how to make a proper AI Alignment review

Joe Collman 13 Jan 2023 12:17 UTC
LW: 3 AF: 2
0
AF
for that to fix these problems the reviewers would have to be more epistemically competent than the post authors
I think this is an overstatement. They’d need to notice issues the post authors missed. That doesn’t require greater epistemic competence: they need only tend to make different mistakes, not fewer mistakes.
Certainly there’s a point below which the signal-to-noise ratio is too low. I agree that high reviewer quality is important.
On the “same old cruxes and disagreements” I imagine you’re right—but to me that suggests we need a more effective mechanism to clarify/resolve them (I think you’re correct in implying that review is not that mechanism—I don’t think academic review achieves this either). It’s otherwise unsurprising that they bubble up everywhere.
I don’t have any clear sense of the degree of time and effort that has gone into clarifying/resolving such cruxes, and I’m sure it tends to be a frustrating process. However, my guess is that the answer is “nowhere close to enough”. Unless researchers have very high confidence that they’re on the right side of such disagreements, it seems appropriate to me to spend ~6 months focusing on purely this (of course this would require coordination, and presumably seems wildly impractical).
My sense is that nothing on this scale happens (right?), and that the reasons have more to do with (entirely understandable) impracticality, coordination difficulties and frustration, than with principled epistemics and EV calculations.
But perhaps I’m way off? My apologies if this is one of the same old cruxes and disagreements :).
- Rohin Shah 13 Jan 2023 22:53 UTC
  LW: 3 AF: 3
  0
  AF Parent
  That doesn’t require greater epistemic competence: they need only tend to make different mistakes, not fewer mistakes.
  Yes, that’s true, I agree my original comment is overstated for this reason. (But it doesn’t change my actual prediction about what would happen; I still don’t expect reviewers to catch issues.)
  My sense is that nothing on this scale happens (right?)
  I’d guess that I’ve spent around 6 months debating these sorts of cruxes and disagreements (though not with a single person of course). I think the main bottleneck is finding avenues that would actually make progress.
  - Joe Collman 14 Jan 2023 1:58 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Ah, well that’s mildly discouraging (encouraging that you’ve made this scale of effort; discouraging in what it says about the difficulty of progress).
    I’d still be interested to know what you’d see as a promising approach here—if such crux resolution were the only problem, and you were able to coordinate things as you wished, what would be a (relatively) promising strategy?
    But perhaps you’re already pursuing it? I.e. if something like [everyone works on what they see as key problems, increases their own understanding and shares insights] seems most likely to open up paths to progress.
    Assuming review wouldn’t do much to help on this, have you thought about distributed mechanisms that might? E.g. mapping out core cruxes and linking all available discussions where they seem a fundamental issue (potentially after holding/writing-up a bunch more MIRI Dialogues style interactions [which needn’t all involve MIRI]).
    Does this kind of thing seem likely to be of little value—e.g. because it ends up clearly highlighting where different intuitions show up, but shedding little light on their roots or potential justification?
    I suppose I’d like to know what shape of evidence seems most likely to lead to progress—and whether much/any of it might be unearthed through clarification/distillation/mapping of existing ideas. (where the mapping doesn’t require connections that only people with the deepest models will find)
    - Rohin Shah 16 Jan 2023 11:58 UTC
      LW: 5 AF: 4
      0
      AF Parent
      Personally if I were trying to do this I’d probably aim to do a combination of:
      Identify what kinds of reasoning people are employing, investigate under what conditions they tend to lead to the truth. E.g. one way that I think I differ from many others is that I am skeptical of analogies as direct evidence about the truth; I see the point of analogies as (a) tools for communicating ideas more effectively and (b) locating hypotheses that you then verify by understanding the underlying mechanism and checking that the mechanism ports (after which you don’t need the analogy any more).
      State arguments more precisely and rigorously, to narrow in on more specific claims that people disagree about (note there are a lot of skulls along this road)
- Raemon 13 Jan 2023 17:51 UTC
  LW: 3 AF: 2
  0
  AF Parent
  FWIW I think a fairly substantial amount of effort has gone into resolving longstanding disagreements. I think that effort has resulted in a lot of good works and updates from many people reading about the disagreement discussion, but not really changed the mind of the people doing the arguing. (See: the MIRI Dialogues)
  And it’s totally plausible to me the answer is “10-100x the amount of work that is gone in so far.”
  I maybe agree that people haven’t literally sat and double-cruxed for six months. I don’t know that it’s fair to describe this as “impracticality, coordination difficulties and frustration” instead of “principled epistemics and EV calculations.” Like, if you’ve done a thing a bunch and it doesn’t seem to be working and you feel like you have traction on another thing, it’s not crazy to do the other thing.
  (That said, I do still have the gut level feeling of ‘man it’s absolutely bonkers that in the so-called rationality community a lot of prominent thinkers still disagree about such fundamental stuff.’)
  - Joe Collman 13 Jan 2023 19:05 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Oh sure, I certainly don’t mean to imply that there’s been little effort in absolute terms—I’m very encouraged by the MIRI dialogues, and assume there are a bunch of behind-the-scenes conversations going on.
    I also assume that everyone is doing what seems best in good faith, and has potentially high-value demands on their time.
    However, given the stakes, I think it’s a time for extraordinary efforts—and so I worry that [this isn’t the kind of thing that is usually done] is doing too much work.
    I think the “principled epistemics and EV calculations” could perfectly well be the explanation, if it were the case that most researchers put around a 1% chance on [Eliezer/Nate/John… are largely correct on the cruxy stuff].
    That’s not the sense I get—more that many put the odds somewhere around 5% to 25%, but don’t believe the arguments are sufficiently crisp to allow productive engagement.
    If I’m correct on that (and I may well not be), it does not seem a principled justification for the status-quo. Granted the right course isn’t obvious—we’d need whoever’s on the other side of the double-cruxing to really know their stuff. Perhaps Paul’s/Rohin’s… time is too valuable for a 6 month cost to pay off. (the more realistic version likely involves not-quite-so-valuable people from each ‘side’ doing it)
    As for “done a thing a bunch and it doesn’t seem to be working”, what’s the prior on [two experts in a field from very different schools of thought talk for about a week and try to reach agreement]? I’m no expert, but I strongly expect that not to work in most cases.
    To have a realistic expectation of its working, you’d need to be doing the kinds of thing that are highly non-standard. Experts having some discussions over a week is standard. Making it your one focus for 6 months is not. (frankly, I’d be over the moon for the one month version [but again, for all I know this may have been tried])
    - Noosphere89 13 Jan 2023 21:32 UTC
      1 point
      0
      Parent
      Even more importantly, Aumann’s Agreement Theorem demands that both sides eventually agree on something, and the fact that the AI Alignment field hasn’t agreed yet is concerning.
      
      Here’s the link:
      
      https://www.lesswrong.com/tag/aumann-s-agreement-theorem