Unless the human could memorize the initialization parameters, they would be using a different neural network to classify.
Why wouldn’t the “human-trained network” be identical to the “original network”? [EDIT: sorry, missed your point. If the human knows the logic of the random number generator that was used to initialize the parameters of the original network, they can manually run the same logic themselves.]
By the same logic, any decision tree that is too large for a human to memorize does not allow theory simulatability as defined in the OP.
If the human knows the logic of the random number generator that was used to initialize the parameters of the original network, they have no problem to manually run the same logic themselves.
Presumably the random seed is going to be big and complicated.
It also wouldn’t just be the random seed you’d need to memorize. You’d have to memorize all the training data and labels too. The scenario I gave was intended to be one where you were asked to study an algorithm for a while, and then afterwards operate it on a blank sheet of paper, given testing data. Even if you had the initialization seed, you wouldn’t just be able to spin up neural networks to make predictions...
I have little to say except that I didn’t want people to take this definition too seriously and now it looks like I’ve done a bad job communicating what I meant. In my head I have a simple notion corresponding to “could I operate this algorithm on a piece of paper if I was asked to study it for a while beforehand.” This is different in my mind from asking whether it is possible to do so on a Turing machine.
But like, I could not operate a large decision tree on a piece of paper if I could study it for a while beforehand, because I wouldn’t remember all of the yes/no questions and their structure.
I could certainly build a decision tree given data, but I could also build a neural network given data.
(Well, actually I think large decision trees and neural nets are both uninterpretable, so I mostly do agree with your definition. I object to having this definition of interpretability and believing that decision trees are interpretable.)
Yes, I wrote in the main post that the authors of tree regularization penalized large trees. Certainly small decision trees are interpretable, and large trees much less so.
Oh, sure, but if you train on a complex task like image classification, you’re only going to get large decision trees (assuming you get decent accuracy), even with regularization.
(Also, why not just use the decision tree if it’s interpretable? Why bother with a neural net at all?)
The hope is that we can design some regularization scheme that provides us the performance of neural networks while providing an interpretation that is understandable. Tree regularization does not do this, but it is one approach that uses regularization for interpretability.
Why wouldn’t the “human-trained network” be identical to the “original network”? [EDIT: sorry, missed your point. If the human knows the logic of the random number generator that was used to initialize the parameters of the original network, they can manually run the same logic themselves.]
By the same logic, any decision tree that is too large for a human to memorize does not allow theory simulatability as defined in the OP.
Presumably the random seed is going to be big and complicated.
It also wouldn’t just be the random seed you’d need to memorize. You’d have to memorize all the training data and labels too. The scenario I gave was intended to be one where you were asked to study an algorithm for a while, and then afterwards operate it on a blank sheet of paper, given testing data. Even if you had the initialization seed, you wouldn’t just be able to spin up neural networks to make predictions...
(just noting that same goes for a decision tree that isn’t small enough s.t. the human can memorize it)
Yes, exactly. That’s why the authors of tree regularization put large penalties for large trees.
I have little to say except that I didn’t want people to take this definition too seriously and now it looks like I’ve done a bad job communicating what I meant. In my head I have a simple notion corresponding to “could I operate this algorithm on a piece of paper if I was asked to study it for a while beforehand.” This is different in my mind from asking whether it is possible to do so on a Turing machine.
But like, I could not operate a large decision tree on a piece of paper if I could study it for a while beforehand, because I wouldn’t remember all of the yes/no questions and their structure.
I could certainly build a decision tree given data, but I could also build a neural network given data.
(Well, actually I think large decision trees and neural nets are both uninterpretable, so I mostly do agree with your definition. I object to having this definition of interpretability and believing that decision trees are interpretable.)
Yes, I wrote in the main post that the authors of tree regularization penalized large trees. Certainly small decision trees are interpretable, and large trees much less so.
Oh, sure, but if you train on a complex task like image classification, you’re only going to get large decision trees (assuming you get decent accuracy), even with regularization.
(Also, why not just use the decision tree if it’s interpretable? Why bother with a neural net at all?)
The hope is that we can design some regularization scheme that provides us the performance of neural networks while providing an interpretation that is understandable. Tree regularization does not do this, but it is one approach that uses regularization for interpretability.