I think an example of what jsteinhardt is referring to would be quicksort.
Quicksort as a piece of software does not require any particular distribution. It is perfectly happy to work with “perversely ordered inputs”.
A human running quicksort with certain expectations about its performance might require a particular distribution, but that’s not a characteristic of software.
Recall that the original claim was that expecting a particular distribution breaks modularity.
If a particular function was advertised as having O(n log(n)) running time, and it turns out that it actually runs in O(n^2) on a certain class of inputs, I would not feel it to be behaving in a very modular way. From an abstraction perspective, breaking time or resource assumptions on an input can be as bad or worse than rejecting it.
Granted, it’s pretty easy to get quicksort to behave on all but the most pathological data sets.
If a particular function was advertised as having O(n log(n)) running time, and it turns out that it actually runs in O(n^2) on a certain class of inputs, I would not feel it to be behaving in a very modular way.
If you have been misled with respect to the behavior of an algorithm on a specific class of inputs, what does it have to do with modularirty??
Modularity refers to the independence of a piece of software from the particulars of other pieces of software. It has nothing to do with performance metrics.
When picking which algorithms to use for particular pieces of your task, performance metrics are a significant factor. If you are told that your algorithm for Section A of the task runs in O(n log n) time and you know that the Section B portion of the program, which requires a stream on inputs from Section A, will run in O(n (log n)^2), you can assume that Section B will be constantly supplied with inputs and the program can busy-wait for the brief portions where it does not have input. If it turns out that for your specific input distribution Section A is taking O(n^2), that’s a critical distinction with far-reaching effects on the design of the rest of your algorithm.
It’s somewhat harder to provide an example for non-networked, non-parallelized code, but I can do so on request.
I understand all that, but I’m reading it as “you should not make mistakes about the expected performance of your code and your application architecture should reflect your use cases.”
There are many ways to screw up a software system.
A human running quicksort with certain expectations about its performance might require a particular distribution, but that’s not a characteristic of software.
I think this may be a distinction without a difference; modularity can also be defined as human expectations about software, namely that the software will be relatively easy to hook into a larger system.
Quicksort as a piece of software does not require any particular distribution. It is perfectly happy to work with “perversely ordered inputs”.
A human running quicksort with certain expectations about its performance might require a particular distribution, but that’s not a characteristic of software.
Recall that the original claim was that expecting a particular distribution breaks modularity.
If a particular function was advertised as having O(n log(n)) running time, and it turns out that it actually runs in O(n^2) on a certain class of inputs, I would not feel it to be behaving in a very modular way. From an abstraction perspective, breaking time or resource assumptions on an input can be as bad or worse than rejecting it.
Granted, it’s pretty easy to get quicksort to behave on all but the most pathological data sets.
If you have been misled with respect to the behavior of an algorithm on a specific class of inputs, what does it have to do with modularirty??
Modularity refers to the independence of a piece of software from the particulars of other pieces of software. It has nothing to do with performance metrics.
When picking which algorithms to use for particular pieces of your task, performance metrics are a significant factor. If you are told that your algorithm for Section A of the task runs in O(n log n) time and you know that the Section B portion of the program, which requires a stream on inputs from Section A, will run in O(n (log n)^2), you can assume that Section B will be constantly supplied with inputs and the program can busy-wait for the brief portions where it does not have input. If it turns out that for your specific input distribution Section A is taking O(n^2), that’s a critical distinction with far-reaching effects on the design of the rest of your algorithm.
It’s somewhat harder to provide an example for non-networked, non-parallelized code, but I can do so on request.
I understand all that, but I’m reading it as “you should not make mistakes about the expected performance of your code and your application architecture should reflect your use cases.”
There are many ways to screw up a software system.
I think this may be a distinction without a difference; modularity can also be defined as human expectations about software, namely that the software will be relatively easy to hook into a larger system.
I don’t find this a useful definition, but YMMV, de gustibus, and all that...