RicG comments on AGI-Automated Interpretability is Suicide

__RicG__ 29 Aug 2023 21:47 UTC
1 point
0
Sorry for taking long to get back to you.
So I take this to be a minor, not a major, concern for alignment, relative to others.
Oh sure, this was more a “look at this cool thing intelligent machines could do that should shut up people from saying things like ‘foom is impossible because training run are expensive’”.
1. learning is at least as important as runtime speed. Refining networks to algorithms helps with one but destroys the other
2. Writing poems, and most cognitive activity, will very likely not resolve to a more efficient algorithm like arithmetic does. Arithmetic is a special case; perception and planning in varied environments require broad semantic connections. Networks excel at those. Algorithms do not.
Please don’t read this as me being hostile, but… why? How sure can we be of this? How sure are you that things-better-than-neural-networks are not out there?
Do we have any (non-trivial) equivalent algorithm that works best inside a NN rather than code?
Btw I am no neuroscientists, so I could be missing a lot of the intuitions you got.
At the end of the day you seem to think that it can be possible to fully interpret and reverse engineer neural networks, but you just don’t believe that Good Old Fashioned AGI can exists and/or be better than training NNs weights?
- Seth Herd 29 Aug 2023 23:11 UTC
  3 points
  0
  Parent
  I haven’t justified either of those statements; I hope to make the complete arguments in upcoming posts. For now I’ll just say that human cognition is solving tough problems, and there’s no good reason to think that algorithms would be lots more efficient than networks in solving those problems.
  
  I’ll also reference Morevec’s Paradox as an intuition pump. Things that are hard for humans, like chess and arithmetic are easy for computers (algorithms); things that are easy for humans, like vision and walking, are hard for algorithms.
  
  I definitely do not think it’s pragmatically possible to fully interpret or reverse engineer neural networks. I think it’s possible to do it adequately to create aligned AGI, but that’s a much weaker criteria.
  - mcint 30 Aug 2023 0:33 UTC
    3 points
    0
    Parent
    Please fix (or remove) the link.
    - Seth Herd 30 Aug 2023 19:01 UTC
      3 points
      0
      Parent
      Done, thanks!

__RicG__ comments on AGI-Automated Interpretability is Suicide

RicG comments on AGI-Automated Interpretability is Suicide