johnswentworth comments on How Do Selection Theorems Relate To Interpretability?

johnswentworth 9 Jun 2022 21:54 UTC
2 points
They would not necessarily be represented with neural nets, unless you’re using “neural nets” to refer to circuits in general.
- tailcalled 9 Jun 2022 21:57 UTC
  2 points
  Parent
  I think by “neural nets” I mean “circuits that get optimized through GD-like optimization techniques and where the vast majority of degrees of freedom for the optimization process come from big matrix multiplications”.
  - johnswentworth 9 Jun 2022 22:01 UTC
    2 points
    Parent
    Yeah, I definitely don’t expect that to be the typical representation; I expect neither an optimization-based training process nor lots of big matrix multiplications.
    - tailcalled 9 Jun 2022 22:03 UTC
      2 points
      Parent
      Interesting and exciting. Can you reveal more about how you expect it to work?
      - johnswentworth 9 Jun 2022 22:12 UTC
        2 points
        Parent
        Based on the forms in Maxent and Abstraction, I expect (possibly nested) sums of local functions to be the main feature-representation. Figuring out which local functions to sum might be done, in practice, by backing equivalent sums out of a trained neural net, but the net wouldn’t be part of the representation.