Rohin Shah comments on Reviewing the Review

Rohin Shah 27 Feb 2020 17:29 UTC
5 points
If the proponents of the MIRI-rationalist view think that (say) a paper by DeepMind has valuable insights from the perspective of the MIRI-rationalist paradigm, and should be featured in “best [according to MIRI-rationalists] of AI alignment work in 2018”
That seems mostly fine and good to me, but I predict it mostly won’t happen (which is why I said “They don’t expect the other side’s work to be useful on their own models”). I think you still have the “poisoning” problem as you call it, but I’m much less worried about it.
I’m more worried about the rankings and reviews, which have a much stronger “poisoning” problem.
Anything anyone says publicly can be read by a non-expert, and if something wrong was said, and the non-expert believes it, then the non-expert gets wrong beliefs. This is a general problem with non-experts, and I don’t see how is it worse here.
- Many more people are likely to read the results of a review, relative to arguments in the comments of a linkpost to a paper.
- Calling something a “review”, with a clear process for generating a ranking, grants it much more legitimacy that one person saying something on the Internet.
So, not only is the MIRI-rationalist viewpoint wrong, it is so wrong that it irreversibly poisons the mind of anyone exposed to it?
Not irreversibly.
Isn’t it a good idea to let people evaluate ideas on their own merits?
When presented with the strongest arguments for both sides, yes. Empirically that doesn’t happen.
If someone endorses a wrong idea, shouldn’t you be able to convince em by presenting counterarguments?
I sometimes can and have. However, I don’t have infinite time. (You think I endorse wrong ideas. Why haven’t you been able to convince me by presenting counterarguments?)
Also, for non-experts this is not necessarily true (or is true only in some vacuous sense). If a non-expert sees within a community of experts 50 people arguing for A, and 1 person arguing for not-A, even if they find the arguments for not-A compelling, in most cases they should still put high credence on A.
(The vacuous sense in which it’s true is that the non-expert could become an expert by spending hundreds or thousands of hours becoming an expert, in which case they can evaluate the arguments on their own merits.)
If you cannot present counterarguments, how are you so sure the idea is actually wrong?
I in fact can present counterarguments, it just takes a long time.
If the person in question cannot understand the counterargument, doesn’t it make em much less valuable for your style of work anyway?
Empirically, it seems that humans have very “sticky” worldviews, such that whichever worldview they first inhabit, it’s very unlikely that they switch to the other worldview. So depending on what you mean by “understand”, I could have two responses:
- They “could have” understood (and generated themselves) the counterargument if they had started out in the opposite worldview
- No one currently in the field is able to “understand” the arguments of the other side, so it’s not a sign of incompetence if a new person cannot “understand” such an argument
Obviously ideal Bayesians wouldn’t have “sticky” worldviews; it turns out humans aren’t ideal Bayesians.
Finally, if you actually believe this, doesn’t it undermine the entire principle of AI debate?
If you mean debate as a proposal for AI alignment, you might hope that we can create AI systems that are closer to ideal Bayesian reasoners than we are, or you might hope that humans who think for a very long time are closer to ideal Bayesian reasoners. Either way, I agree this is a problem that would have to be dealt with.
If you mean debate as in “through debate, AI alignment researchers will have better beliefs”, then yes, it does undermine this principle. (You might have noticed that not many alignment researchers try to do this sort of debate.)