Neel Nanda comments on Concrete Advice for Forming Inside Views on AI Safety

Neel Nanda 31 Aug 2022 6:55 UTC
LW: 2 AF: 1
0
AF
Thanks for the thoughts, and sorry for dropping the ball on responding to this!
I appreciate the pushback, and broadly agree with most of your points.
In particular, I strongly agree that if you’re trying to form the ability to be a research lead in alignment (and less strongly, be an RE/otherwise notably contribute to research) that forming an inside view is important, totally independently from how well it tracks the truth, and agree that I undersold that in my post.
In part, I think the audience I had in mind is different from you? I see this as partially aimed at proto-alignment researchers, but also a lot of people who are just trying to figure out whether to work on it/how to get into the field, including in less technical roles (policy, ops, community building), where I also have often seen a strong push for inside views. I strongly agree that if someone is actively trying to be an alignment researcher that forming inside views is useful. Though it seems pretty fine to do this on the job/after starting a PhD program, and in parallel to trying to do research under a mentor.
don’t reject an expert’s view before you’ve tried really hard to understand it and make it something that does work
I’m pretty happy with this paraphrase of what I mean. Most of what I’m pointing to is using the mental motion of trying to understand things rather than the mental motion of trying to evaluate things, I agree that being literally unable to evaluate would be pretty surprising.
One way that I think it’s importantly different is that it feels more comfortable to maintain black boxes when trying to understand something than when trying to evaluate something. Eg, I want to understand why people in the field have short timelines. I get to the point where I see how if I bought scaling laws continuing then everything follows. I am not sure why people believe this, and personally feel pretty confused, but expect other people to be much more informed than me. This feels like an instance where I understand why they hold their view fairly well, and maybe feel comfortable deferring to them, but don’t feel like I can really evaluate their view?
What fraction of people who are trying to build inside views do you think have these problems? (Relevant since I often encourage people to do it)
Honestly I’m not sure—I definitely did, and have had some anecdata of people telling me they found my posts/claims extremely useful, or that they found these things pretty stressful, but obviously there’s major selection bias. This is also just an objectively hard thing that I think many people find overwhelming (especially when tied to their social identity, status, career plans, etc). I’d guess maybe 40%? I expect framing matters a lot, and that eg pointing people to my posts may help?
I’m not immediately thinking of examples of people without inside views doing independent research that I would call “great safety relevant work”.
Agreed, I’d have pretty different advice for people actively trying to do impactful independent research.
Idk, I feel like I formed my inside views by locking myself in my room for months and meditating on safety.
Interesting, thanks for the data point! That’s very different from the kinds of things that work well for me (possibly just because I find locking myself in my room for a long time hard and exhausting), and suggests my advice may not generalise that well. Idk, people should do what works for them. I’ve found that spending time in the field resulted in me being exposed to a lot of different perspectives and research agendas, forming clearer views on how to do research, flaws in different approaches, etc. And all of this has helped me figure out my own views on things. Though I would like to have much better and clearer views than I currently do.
- Rohin Shah 31 Aug 2022 7:11 UTC
  LW: 4 AF: 3
  2
  AF Parent
  also a lot of people who are just trying to figure out whether to work on it/how to get into the field, including in less technical roles (policy, ops, community building), where I also have often seen a strong push for inside views.
  Oh wild. I assumed this must be directed at researchers since obviously they’re the ones who most need to form inside views. Might be worth adding a note at the top saying who your audience is.
  For that audience I’d endorse something like “they should understand the arguments well enough that they can respond sensibly to novel questions”.
  One proxy that I’ve considered previously is “can they describe an experiment (in enough detail that a programmer could go implement it today) that would mechanistically demonstrate a goal-directed agent pursuing some convergent instrumental subgoal”.
  I think people often call this level of understanding an “inside view”, and so I feel like I still endorse what-people-actually-mean, even though it’s quantitatively much less understanding than you’d want to actively do research.
  (Though it also wouldn’t shock me if people were saying “everyone in the less technical roles needs to have a detailed take on exactly which agendas are most promising and why and this take should be robust to criticism from senior AI safety people”. I would disagree with that.)
  This feels like an instance where I understand why they hold their view fairly well, and maybe feel comfortable deferring to them, but don’t feel like I can really evaluate their view?
  I would have said you don’t understand an aspect of their view, and that’s exactly the aspect you can’t evaluate. (And then if you try to make a decision, the uncertainty from that aspect propagates into uncertainty about the decision.) But this is mostly semantics.
  I’d guess maybe 40%? I expect framing matters a lot, and that eg pointing people to my posts may help?
  Thanks, I’ll keep that in mind.
  I’ve found that spending time in the field resulted in me being exposed to a lot of different perspectives and research agendas, forming clearer views on how to do research, flaws in different approaches, etc.
  Tbc I did all of this too—by reading a lot of papers and blog posts and thinking about them.
  (The main exception is “how to do research”, that I think I learned from just practicing doing research + advice from my advisors.)