It definitely will vary with the environment, though the question is degree. I suspect most of the variation will be in how much optimization power you need, as opposed to how difficult it is to get some degree of optimization power, which motivates the model presented here—though certainly there will be some deviation in both. The footnote should probably be rephrased so as not to assert that it is completely independent, as I agree that it obviously isn’t, but just that it needs to be relatively independent, with the amount of optimization power dominating for the model to make sense.
Renamed x to x∗—good catch (though editing doesn’t appear to be working for me right now—it should show up in a bit)!
Algorithmic range is very similar to model capacity, except that we’re thinking slightly more broadly as we’re more interested in the different sorts of general procedures your model can learn to implement than how many layers of convolutions you can do. That being said, they’re basically the same thing.
I actually just updated the paper to just use model capacity instead of algorithmic range to avoid needlessly confusing machine learning researchers, though I’m keeping algorithmic range here.
I suspect most of the variation will be in how much optimization power you need, as opposed to how difficult it is to get some degree of optimization power, which motivates the model presented here—though certainly there will be some deviation in both.
Fwiw, I have the opposite intuition quite strongly, but not sure it’s worth debating that here.
It definitely will vary with the environment, though the question is degree. I suspect most of the variation will be in how much optimization power you need, as opposed to how difficult it is to get some degree of optimization power, which motivates the model presented here—though certainly there will be some deviation in both. The footnote should probably be rephrased so as not to assert that it is completely independent, as I agree that it obviously isn’t, but just that it needs to be relatively independent, with the amount of optimization power dominating for the model to make sense.
Renamed x to x∗—good catch (though editing doesn’t appear to be working for me right now—it should show up in a bit)!
Algorithmic range is very similar to model capacity, except that we’re thinking slightly more broadly as we’re more interested in the different sorts of general procedures your model can learn to implement than how many layers of convolutions you can do. That being said, they’re basically the same thing.
I actually just updated the paper to just use model capacity instead of algorithmic range to avoid needlessly confusing machine learning researchers, though I’m keeping algorithmic range here.
Fwiw, I have the opposite intuition quite strongly, but not sure it’s worth debating that here.