leogao comments on Shallow review of live agendas in alignment & safety

leogao 28 Nov 2023 23:00 UTC
12 points
5
There’s also a much harder and less impartial option, which is to have an extremely opinionated survey that basically picks one lens to view the entire field and then describes all agendas with respect to that lens in terms of which particular cruxes/assumptions each agenda runs with. This would necessarily require the authors of the survey to deeply understand all the agendas they’re covering, and inevitably some agendas will receive much more coverage than other agendas.
This makes it much harder than just stapling together a bunch of people’s descriptions of their own research agendas, and will never be “the” alignment survey because of the opinionatedness. I still think this would have a lot of value though: it would make it much easier to translate ideas between different lenses/notice commonalities, and help with figuring out which cruxes need to be resolved for people to agree.
Relatedly, I don’t think alignment currently has a lack of different lenses (which is not to say that the different lenses are meaningfully decorrelated). I think alignment has a lack of convergence between people with different lenses. Some of this is because many cruxes are very hard to resolve experimentally today. However, I think even despite that it should be possible to do much better than we currently are—often, it’s not even clear what the cruxes are between different views, or whether two people are thinking about the same thing when they make claims in different language.
- LawrenceC 29 Nov 2023 10:07 UTC
  7 points
  0
  Parent
  I strongly agree that this would be valuable; if not for the existence of this shallow review I’d consider doing this myself just to serve as a reference for myself.
  - leogao 29 Nov 2023 10:13 UTC
    11 points
    4
    Parent
    Fwiw I think “deep” reviews serve a very different purpose from shallow reviews so I don’t think you should let the existence of shallow reviews prevent you from doing a deep review
- Steven Byrnes 29 Nov 2023 13:37 UTC
  5 points
  1
  Parent
  I’ve written up an opinionated take on someone else’s technical alignment agenda about three times, and each of those took me something like 100 hours. That was just to clearly state why I disagreed with it; forget about resolving our differences :)
- M. Y. Zuo 4 Dec 2023 3:28 UTC
  3 points
  −5
  Parent
  Even that is putting it a bit too lightly.
  i.e. Is there even a single, bonafide, novel proof at all?
  Proven mathematically, or otherwise demonstrated with 100% certainty, across the last 10+ years.
  Or is it all just ‘lenses’, subjective views, probabilistic analysis, etc...?