Thanks for the post! This is fantastic stuff, and IMO should be required MI reading.
Does anyone who perhaps knows more about this than me wonder if SQ dimension is a good formal metric for grounding the concept of explicit vs tacit representations? It appears to me the only reason you can’t reduce a system down further by compressing into a ‘feature’ is that by default it relies on aggregation, requiring a ‘bird’s eye view’ of all the information in the network.
I mention this as I was revisiting some old readings today on inductive biases of NNs, then realised that one reason why low complexity functions can be arbitrarily hard for NNs to learn could be because they have high SQ dimension (best example: binary parity).
I’m not that confident about the statistical query dimension (I assume that’s what you mean by SQ dimension?) But I don’t think it’s applicable; SQ dimension is about the difficulty of a task (e.g binary parity), wheras explicit vs tacit representations are properties of an implementation, so it’s kind of apples to oranges.
To take the chess example again, one way to rank moves is to explicitly compute some kind of rule or heuristic from the board state, and another is to do some kind of parallel search, and yet another is to use a neural network or something similar. The first one is explicit, the second is (maybe?) more tacit, and the last is unclear. I think stronger variations of the LRH kind of assume that the neural network must be ‘secretly’ explicit, but I’m not really sure this is neccesary.
But I don’t think any of this is really affected by the SQ dimension because it’s the same task in all three cases (and we could possibly come up with examples which had identical performance?)
but maybe i’m not quite understanding what you mean
Thanks for the post! This is fantastic stuff, and IMO should be required MI reading.
Does anyone who perhaps knows more about this than me wonder if SQ dimension is a good formal metric for grounding the concept of explicit vs tacit representations? It appears to me the only reason you can’t reduce a system down further by compressing into a ‘feature’ is that by default it relies on aggregation, requiring a ‘bird’s eye view’ of all the information in the network.
I mention this as I was revisiting some old readings today on inductive biases of NNs, then realised that one reason why low complexity functions can be arbitrarily hard for NNs to learn could be because they have high SQ dimension (best example: binary parity).
I’m not that confident about the statistical query dimension (I assume that’s what you mean by SQ dimension?) But I don’t think it’s applicable; SQ dimension is about the difficulty of a task (e.g binary parity), wheras explicit vs tacit representations are properties of an implementation, so it’s kind of apples to oranges.
To take the chess example again, one way to rank moves is to explicitly compute some kind of rule or heuristic from the board state, and another is to do some kind of parallel search, and yet another is to use a neural network or something similar. The first one is explicit, the second is (maybe?) more tacit, and the last is unclear. I think stronger variations of the LRH kind of assume that the neural network must be ‘secretly’ explicit, but I’m not really sure this is neccesary.
But I don’t think any of this is really affected by the SQ dimension because it’s the same task in all three cases (and we could possibly come up with examples which had identical performance?)
but maybe i’m not quite understanding what you mean