Neel Nanda comments on Thoughts on AGI organizations and capabilities work

Neel Nanda Dec 8, 2022, 11:18 PM
LW: 12 AF: 8
−7
AF
I appreciate this post! It feels fairly reasonable, and much closer to my opinion than (my perception of) previous MIRI posts. Points that stand out:
- Publishing capabilities work is notably worse than just doing the work.
  - I’d argue that hyping up the capabilities work is even worse than just quietly publishing it without fanfare.
  - Though, a counter-point is that if an organisation doesn’t have great cyber-security and is a target for hacking, capabilities can easily leak (see, eg, the Soviets getting nuclear weapons 4 year after the US, despite it being a top secret US program and before the internet)
- Capabilities work can be importantly helpful for alignment work, especially empirical focused work.
Probably my biggest crux is around the parallel vs serial thing. My read is that fairly little current alignment work really feels “serial” to me. Assuming that you’re mostly referring to conceptual alignment work, my read is that a lot of it is fairly confused, and would benefit a lot from real empirical data and real systems that can demonstrate concepts such as agency, planning, strategic awareness, etc. And just more data on what AGI cognition might look like. Without these, it seems extremely hard to distinguish true progress from compelling falsehoods.
- LawrenceC Dec 9, 2022, 8:32 PM
  LW: 5 AF: 4
  2
  AF Parent
  Publishing capabilities work is notably worse than just doing the work.
  I’d argue that hyping up the capabilities work is even worse than just quietly publishing it without fanfare.
  What’s the mechanism you’re thinking of, through which hype does damage?
  I also doubt that good capabilities work will be published “without fanfare”, given how watched this space is.
  My read is that fairly little current alignment work really feels “serial” to me. Assuming that you’re mostly referring to conceptual alignment work, my read is that a lot of it is fairly confused, and would benefit a lot from real empirical data and real systems that can demonstrate concepts such as agency, planning, strategic awareness, etc.
  I think this is more an indictment of existing work, and less a statement about what work needs to be done. e.g. my guess is we’ll both agree that the original inner alignment work from Evan Hubinger is pretty decent conceptual research. And I think much conceptual work seems pretty serial to me, and is hard to parallelize due to reasons like “intuitions from the lead researcher are difficult to share” and communications difficulties in general.
  
  Of course, I also do agree that there’s a synergy between empirical data and thinking—e.g. one of the main reasons I’m excited about Redwood’s agenda is because it’s very conceptually driven, which lets it be targeted at specific problems (for example, they’re coming with techniques that aim to solve the mechanistic anomaly detection problem, and finding current analogues and doing experiments with those).
  - Neel Nanda Dec 9, 2022, 9:05 PM
    LW: 6 AF: 4
    1
    AF Parent
    
    What’s the mechanism you’re thinking of, through which hype does damage?
    
    This ship may have sailed at this point, but to me the main mechanism is getting other actors to pay attention, focus on the most effective kind of capabilities work, and making it more politically feasible to raise support. Eg, I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM. Legibly making a ton of money with it falls in a similar category to me.
    
    Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
    
    And I think much conceptual work seems pretty serial to me, and is hard to parallelize due to reasons like “intuitions from the lead researcher are difficult to share” and communications difficulties in general.
    
    Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
    
    I agree that risks from learned optimisation is important and didn’t need this, and plausibly feels like a good example of serial work to me.
    - LawrenceC Dec 9, 2022, 9:23 PM
      LW: 3 AF: 2
      0
      AF Parent
      I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM.
      Wouldn’t surprise me if this was true, but I agree with you that it’s possible the ship has already sailed on LLMs. I think this is more so the case if you have a novel insight about what paths are more promising to AGI (similar to the scaling hypothesis in 2018)---getting ~everyone to adopt that insight would significantly advance timelines, though I’d argue that publishing it (such that only the labs explicitly aiming at AGI like OpenAI and Deepmind adopt it) is not clearly less bad than hyping it up.
      Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
      Surely this is because it didn’t say anything except “Deepmind is also now in the LLM game”, which wasn’t surprising given Geoff Irving left OpenAI for Deepmind? There weren’t significant groundbreaking techniques used to train Gopher as far as I can remember.
      Chinchilla, on the other hand, did see a ton of fanfare.
      Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
      Cool. I agree with you that conceptual work is bad in part because of a lack of good examples/grounding/feedback loops, though I think this can be overcome with clever toy problem design and analogies to current problems (that you can then get the examples/grounding/feedback loops from). E.g. surely we can test toy versions of shard theory claims using the small algorithmic neural networks we’re able to fully reverse engineer.