I think there’s a continuum of speed vs generality. You can add more complexity to improve speed, but it comes at the cost of generality (assuming all algorithms are implemented in an equally efficient manner).
Given a finite compute budget, it’s definitely possible to choose an algorithm that’s more general than your budget can afford. In that case, you hit under fitting issues. Then, you can find a more complex algorithm that’s more specialized to the specific distribution you’re trying to model. Such an algorithm is faster to run on your training distribution but is less likely to generalize to other distributions.
The overall point of the presentation is that such a tendency isn’t just some empirical pattern that we can only count on for algorithms developed by humans. It derives from an information-theoretic perspective on the relation between algorithmic complexity, runtime and input space complexity.
Edit: to clarify an implicit assumption in my presentation, I’m assuming all algorithms are run for as long as needed to correctly decide all inputs. The analysis focuses on how the runtime changes as the simplicity increases and the input space widens.
I think there’s a continuum of speed vs generality. You can add more complexity to improve speed, but it comes at the cost of generality (assuming all algorithms are implemented in an equally efficient manner).
Given a finite compute budget, it’s definitely possible to choose an algorithm that’s more general than your budget can afford. In that case, you hit under fitting issues. Then, you can find a more complex algorithm that’s more specialized to the specific distribution you’re trying to model. Such an algorithm is faster to run on your training distribution but is less likely to generalize to other distributions.
The overall point of the presentation is that such a tendency isn’t just some empirical pattern that we can only count on for algorithms developed by humans. It derives from an information-theoretic perspective on the relation between algorithmic complexity, runtime and input space complexity.
Edit: to clarify an implicit assumption in my presentation, I’m assuming all algorithms are run for as long as needed to correctly decide all inputs. The analysis focuses on how the runtime changes as the simplicity increases and the input space widens.