β-redex comments on Why I’m Working On Model Agnostic Interpretability

β-redex 13 Nov 2022 0:54 UTC
2 points
0
I think we usually don’t generalize very far not because we don’t have general models, but because it’s very hard to state any useful properties about very general models.

You can trivially view any model/agent as a Turing machine, without loss of generality.^[1] We just usually don’t do that because it’s very hard to state anything useful about such a general model of computation. (It seems very hard to prove/disprove P=NP, we know for a fact that halting is undecidable, etc.)

I am very interested though what model John will use to state useful theorems that capture both the current DL paradigm, and the next paradigm with high probability. (He might have written about this somewhere already, haven’t read all his stuff yet.)
1. ↩︎
  Assuming determinism, but OP’s black-box interpretability stuff already seems to assume that.
- tailcalled 17 Nov 2022 12:16 UTC
  3 points
  0
  Parent
  I think he addressed it in Don’t Get Distracted By The Boilerplate.