tailcalled comments on How Do Selection Theorems Relate To Interpretability?

tailcalled 9 Jun 2022 21:39 UTC
2 points
Those structures would likely also be represented with neural nets, though, right? So in practice it seems like it would end up quite similar to looking for isomorphic structures between neural networks, except you specifically want to design a highly interpretable kind of neural network and then look for isomorphisms between this interpretable neural network and other neural networks.
- johnswentworth 9 Jun 2022 21:54 UTC
  2 points
  Parent
  They would not necessarily be represented with neural nets, unless you’re using “neural nets” to refer to circuits in general.
  - tailcalled 9 Jun 2022 21:57 UTC
    2 points
    Parent
    I think by “neural nets” I mean “circuits that get optimized through GD-like optimization techniques and where the vast majority of degrees of freedom for the optimization process come from big matrix multiplications”.
    - johnswentworth 9 Jun 2022 22:01 UTC
      2 points
      Parent
      Yeah, I definitely don’t expect that to be the typical representation; I expect neither an optimization-based training process nor lots of big matrix multiplications.
      - tailcalled 9 Jun 2022 22:03 UTC
        2 points
        Parent
        Interesting and exciting. Can you reveal more about how you expect it to work?
        johnswentworth 9 Jun 2022 22:12 UTC
        2 points
        Parent
        Based on the forms in Maxent and Abstraction, I expect (possibly nested) sums of local functions to be the main feature-representation. Figuring out which local functions to sum might be done, in practice, by backing equivalent sums out of a trained neural net, but the net wouldn’t be part of the representation.