research about AI

Chi Nguyen 2 May 2024 18:13 UTC
16 points
3
Are there types of published alignment research that you think were (more likely to be) good to publish? If so, I’d be curious to see a list.
- Morphism 2 May 2024 20:29 UTC
  10 points
  5
  Parent
  Some off the top of my head:
  - Outer Alignment Research (e.g. analytic moral philosophy in an attempt to extrapolate CEV) seems to be totally useless to capabilities, so we should almost definitely publish that.
  - Evals for Governance? Not sure about this since a lot of eval research helps capabilities, but if it leads to regulation that lengthens timelines, it could be net positive.
  Edit: oops i didn’t see tammy’s comment
- Tamsin Leake 2 May 2024 20:24 UTC
  9 points
  5
  Parent
  I think research that is mostly about outer alignment (what to point the AI to) rather than inner alignment (how to point the AI to it) tends to be good — quantilizers, corrigibility, QACI, decision theory, embedded agency, indirect normativity, infra bayesianism, things like that. Though I could see some of those backfiring the way RLHF did — in the hands of a very irresponsible org, even not very capabilities-related research can be used to accelerate timelines and increase race dynamics if the org doing it thinks it can get a quick buck out of it.
  - mako yass 2 May 2024 21:38 UTC
    4 points
    1
    Parent
    You think that studying agency and infrabayesianism wont make small contributions to capabilities? Even just saying “agency” in the context of AI makes capabilities progress.
  - Morphism 2 May 2024 20:33 UTC
    3 points
    3
    Parent
    I could see embedded agency being harmful though, since an actual implementation of it would be really useful for inner alignment

Chi Nguyen comments on Please stop publishing ideas/​insights/​research about AI

Chi Nguyen comments on Please stop publishing ideas/insights/research about AI