RicG comments on AGI-Automated Interpretability is Suicide

__RicG__ 24 Aug 2023 13:31 UTC
1 point
0
Thanks for coming back to me.
“OK good point, but it’s hardly “suicide” to provide just one more route to self-improvement”
I admit the title is a little bit clickbaity, but given my list of assumption (which do include that NNs can be made more efficient by interpreting them) it does elucidate a path to foom (which does look like suicide without alignment).
Unless there’s an equally efficient way to do that in closed form algorithms, they have a massive disadvantage in any area where more learning is likely to be useful.
I’d like to point out that in this instance I was talking about the learned algorithm not the learning algorithm. Learning to learn is a can of worms I am not opening rn, even though it’s probably the area that you are referring to, but, still, I don’t really see a reason that there could not be more efficient undiscovered learning algorithms (and NN+GD was not learned, it was intelligently designed by us humans. Is NN+GD the best there is?).
Maybe I should clarify how I imagined the NN-AGI in this post: a single huge inscrutable NN like GPT. Maybe a different architecture, maybe a bunch of NNs in trench coat, but still mostly NN. If that is true then there is a lot of things that can be upgraded by writing them in code rather than keeping them in NNs (arithmetic is the easy example, MC tree search is another...). Whatever MC tree search the giant inscrutable matrices have implemented, they are probably really bad compared to sturdy old fashioned code.
Even if NNs are the best way to learn algorithms, they are not be the best way to design them. I am talking about the difference between evolvable and designable.
NN allow us to evolve algorithms, code allows us to intelligently design them: if there is no easy evolvable path to an algorithm, neural networks will fail.
The parallel to evolution is: evolution cannot make bones out of steel (even though they would be much better) because there is no shallow gradient to get steel (no way to have the recipe for steel-bones be in a way that if the recipe is slightly changed you still get something steel-like and useful). Evolution needs a smooth path from not-working to working while design doesn’t.
With intelligence the computations don’t need to be evolved (or learned) it can be designed, shaped with intent.
Are you really that confident that the steel equivalent of algorithms doesn’t exist? Even though as humans we have barely explored that area (nothing hard-coded comes close to even GPT-2)?
Do we have any (non-trivial) equivalent algorithm that works best inside a NN rather than code? I guess those might be the hardest to design/interpret so we won’t know for certain for a long time...
Arithmetic is a closed cognitive function; we know exactly how it works and don’t need to learn more.
If we knew exactly how make poems of math theorems (like GPT-4 does) that would make it a “closed cognitive function” too, right? Can that learned algorithm be reversed engineered from GPT-4? My answer is yes ⇒ foom ⇒ we ded.
- Seth Herd 24 Aug 2023 17:39 UTC
  2 points
  0
  Parent
  Any type of self-improvement in an un-aligned AGI = death. And if it’s already better than human level, it might not even need to do a bit of self-improvement, just escape our control, and we’re dead. So I think the suicide is quite a bit of hyperbole, or at least stated poorly relative to the rest of the conceptual landscape here.
  
  If the AGI is aligned when it self-improves with algorithmic refinement, reflective stability should probably cause it to stay aligned after, and we just have a faster benevolent superintelligences.
  
  So this concern is one more route to self-improvement. And theres a big question of how good a route it is.
  
  My points were:
  1. learning is at least as important as runtime speed. Refining networks to algorithms helps with one but destroys the other
  2. Writing poems, and most cognitive activity, will very likely not resolve to a more efficient algorithm like arithmetic does. Arithmetic is a special case; perception and planning in varied environments require broad semantic connections. Networks excel at those. Algorithms do not.
  So I take this to be a minor, not a major, concern for alignment, relative to others.
  - __RicG__ 29 Aug 2023 21:47 UTC
    1 point
    0
    Parent
    Sorry for taking long to get back to you.
    So I take this to be a minor, not a major, concern for alignment, relative to others.
    Oh sure, this was more a “look at this cool thing intelligent machines could do that should shut up people from saying things like ‘foom is impossible because training run are expensive’”.
    learning is at least as important as runtime speed. Refining networks to algorithms helps with one but destroys the other
    Writing poems, and most cognitive activity, will very likely not resolve to a more efficient algorithm like arithmetic does. Arithmetic is a special case; perception and planning in varied environments require broad semantic connections. Networks excel at those. Algorithms do not.
    Please don’t read this as me being hostile, but… why? How sure can we be of this? How sure are you that things-better-than-neural-networks are not out there?
    Do we have any (non-trivial) equivalent algorithm that works best inside a NN rather than code?
    Btw I am no neuroscientists, so I could be missing a lot of the intuitions you got.
    At the end of the day you seem to think that it can be possible to fully interpret and reverse engineer neural networks, but you just don’t believe that Good Old Fashioned AGI can exists and/or be better than training NNs weights?
    - Seth Herd 29 Aug 2023 23:11 UTC
      3 points
      0
      Parent
      I haven’t justified either of those statements; I hope to make the complete arguments in upcoming posts. For now I’ll just say that human cognition is solving tough problems, and there’s no good reason to think that algorithms would be lots more efficient than networks in solving those problems.
      
      I’ll also reference Morevec’s Paradox as an intuition pump. Things that are hard for humans, like chess and arithmetic are easy for computers (algorithms); things that are easy for humans, like vision and walking, are hard for algorithms.
      
      I definitely do not think it’s pragmatically possible to fully interpret or reverse engineer neural networks. I think it’s possible to do it adequately to create aligned AGI, but that’s a much weaker criteria.
      - mcint 30 Aug 2023 0:33 UTC
        3 points
        0
        Parent
        Please fix (or remove) the link.
        Seth Herd 30 Aug 2023 19:01 UTC
        3 points
        0
        Parent
        Done, thanks!

__RicG__ comments on AGI-Automated Interpretability is Suicide

RicG comments on AGI-Automated Interpretability is Suicide