In my experience, in math/science prerequisites often can (and should) be ignored, and learned as you actually need them. People who thoroughly follow all the prerequisites often end up bogged down in numerous science fields which have actually weak connection to what they wanted to learn initially, and then get demotivated and drop out of their endeavor. This is a common failure mode.
Like, you need probability theory to do machine learning, but some you are unlikely to encounter some parts of it, and also there are parts of ML which require very little of it. It totally makes sense to start with them.
I’m thinking more specifically than you are. Rather than learning probability theory to understand ML, learn only what you determine to be necessary for what ML applications you are interested in. The concept maps I use are very specific, and they avoid the weak connection problem you mention. (It’s worth noting that I develop these as an autodidact, so I don’t have to take an entire class to just get a few facts I’m interested in.)
It sounds like both you and estimator are actually both on the same page: estimator seems to be talking about the “prerequisite” in the sense of, “systematic prerequisite”, as in, people say that you should learn X before you learn Y. You seem to be talking about “prerequisite” in the sense that, “skill X is a necessary component of skill Y”
Both of you, however, seem to agree that you should ignore the stuff that is irrelevant to what you are actually trying to accomplish.
This is a good way to put it. I may not have been clear.
To use an example, I have a concept map about fluid dynamics that I used in a class I took on turbulence recently. There were a few concepts that I did not understand well at some point, and I wanted to figure out which ones. To be more specific, isotropic tensors are often used in turbulence theory and modeling, but I didn’t really understand how to construct isotropic tensors algebraically. It became pretty clear this was something I should learn given the number of links isotropic tensors had to other concepts.
On the other hand, if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired. You can learn techniques like e.g. matrix inversions as needed to implement the algorithms you’re learning, but if you don’t understand how those techniques work in their original context, they become very hard to debug or optimize. Similarly for e.g. cryptography and basic information theory.
That’s probably more the exception than the rule, though; I sense that the point of most prerequisites in a traditional science curriculum is less to provide skills to build on and more to build habits of rigorous thinking.
Read what is a matrix, how to add, multiply and invert them, what is a determinant and what is an eigenvector and that’s enough to get you started. There are many algorithms in ML where vectors/matrices are used mostly as a handy notation.
Yes, you will be unable to understand some parts of ML which substantially require linear algebra; yes, understanding ML without linear algebra is harder; yes, you need linear algebra for almost any kind of serious ML research—but it doesn’t mean that you have to spend a few years studying arcane math before you can open a ML textbook.
Who said anything about a few years? If you paid attention in high school, the linear algebra background you need is at most a few months’ worth of work. I was providing a single counterexample, not saying that the full prerequisite list (which, if memory serves, is most of a CS curriculum for your average ML class) is always necessary.
if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired
That depends on whether you’re doing research or purely applied stuff. For applied use, domain expertise trumps knowing the internal details of the algorithms which you usually just call as pre-build functions—as long as you understand what do they do and where the limits (and the traps) are.
Not many people can invert matrices by hand any more and that’s not a problem for a higher-level understanding of linear algebra. Similarly, you don’t necessarily need to understand, say, how singular value decomposition works in order to do successful higher-level modeling of some domain.
I wasn’t pointing strictly to research, but I was pointing to low-level implementation. It now occurs to me that I might be unusual in this respect—much of my ML experience is in the context of a rather weird environment that didn’t have any existing libraries, leaving me to cut a lot of code myself.
So I might have to back off from “ability to do machine learning”. You can, in retrospect, use ML perfectly competently in a lot of settings even if the closest you’ve ever gotten to a simulated annealing algorithm is plugging the cost function into a Python library; but I have a hard time calling someone an expert if they’ve never written anything lower-level, just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
I have a feeling that’s a bit of a relic.
Long time ago programming environments were primitive and Real Men wrote their own sorts and hash tables (there is a canonical story from even more Ancient Times). But times have changed. I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch—similarly to how I can’t imagine needing to construct a washing machine out of a tub, an electric motor, pulleys, and belts.
I certainly still care about performance properties of various sorts, but I don’t care about their internal details as long as the performance properties hold. I suspect that the interview questions of the “implement a bubble sort on this piece of paper” variety if anything aim more at “have you been paying attention during your CS classes” and less at “do you have a deep understanding of what your program is doing”.
The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.
I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch
You can’t? I’ve had to do that several times. The usual scenario is that there are search/sort routines, but they have inconvenient properties—either they don’t perform well in the specific problem domain I’m dealing with (happens a lot in simulation; functions for efficiently doing certain types of categorization on spatially arranged data are rare outside graphics libraries), or they don’t work on the data types I need and a reduction is impractical for one reason or another, or they exist but can’t be used for legal reasons. Unless you always situate yourself in the most popular subfields, which I frankly find boring, you can’t always count on there being a library that does exactly what you want—all the more so in a still-emerging space like ML.
(I’ve never had to build a washing machine, incidentally, but I’ve had to fix washing machines—twice this year for two different machines, in fact. I could have hired a mechanic or bought a new machine, but either one would have cost me hundreds of dollars.)
Well, I was talking about standard sort routines—the ones where you have a vector of values and a comparator function. Now, search is quite a different beast altogether.
The thing is, most sorting is brute-force where you just sort without taking into account the specific structure of your data. That approach works well with sorting—but it doesn’t work well with search. The obvious problem is that we are interested in searching very large search spaces where brute force is nowhere close to practical. The salvation comes from the particular structure of the space which allows us to be much more efficient that brute-force, but the same structure forces us into custom solutions.
Because the structures of search spaces can be very different, there is a LOT of search algorithms and frequently enough you have to make bespoke versions of them to fit your particular problem. That’s entirely normal. Plus, of course, optimization is a subtype of search and customizing optimizers is also quite common.
but I’ve had to fix washing machines
Sure, so have I. In fact, I probably would be able to construct a washing machine out of a tub, an electric motor, and some parts. It will take a lot of time and will look ugly, but I think it’ll work. That doesn’t mean I’ll feel a need to do this :-)
Yes, this this this this this this this. “The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.” If I could upvote this multiple times, I would.
This is the crux of this entire approach. Learn the higher level, applied abstractions. And learn the very basic fundamentals. Forget learning how the lower-level gears turn: just learn the fundamental laws of physics. If you ever need to figure out a lower-level gear, you can just derive it from your knowledge of the fundamentals, combined with your big-picture knowledge of how that gear fits into the overall system.
That only works if there are few levels of abstraction; I doubt that you can derive how do programs work at the machine codes level based of your knowledge of physics and high-level programming. Sometimes, gears are so small that you can’t even see them on your top level big picture, and sometimes just climbing up one level of abstraction takes enormous effort if you don’t know in advance how to do it.
I think that you should understand, at least once, how the system works on each level and refresh/deepen that knowledge when you need it.
The definition of “fundamentals” differs though, depending on how abstract you get. The more layers of abstraction, the more abstract the fundamentals. If my goal is high-level programming, I don’t need to know how to write code on bare metal.
That’s why I advocate breaking things down until you reach the level of triviality for you personally. Most people will find, “writing a for-loop” to be trivial, without having to go farther down the rabbit hole. At a certain point, breaking things down too far actually makes things less trivial.
Can I give a counterexample? I think that way of learning things might help if you only need to apply the higher-level skills as you learned them, but if you need to develop or research those fields yourself, I’ve found you really do need the background.
As in, I have been bitten on the ass by my own choice not to double-major in mathematics in undergrad, thus resulting in my having to start climbing the towers of continuous probability and statistics/ML, abstract algebra, logic, real analysis, category theory, and topology in and after my MSc.
There’s a big difference between the fundamentals, and the low-level practical applications. I think the latter is what estimator is referring to. You can’t really make a breakthrough or do real research without a firm grasp of the fundamentals. But you definitely can make a breakthrough in, say, physics, without knowing the exact tensile strength of wood vs. steel. And yet, that type of “Applied Physics” was a pre-requisite at my school for the more advanced fields of physics that I was actually interested in.
In my experience, in math/science prerequisites often can (and should) be ignored, and learned as you actually need them. People who thoroughly follow all the prerequisites often end up bogged down in numerous science fields which have actually weak connection to what they wanted to learn initially, and then get demotivated and drop out of their endeavor. This is a common failure mode.
Like, you need probability theory to do machine learning, but some you are unlikely to encounter some parts of it, and also there are parts of ML which require very little of it. It totally makes sense to start with them.
I’m thinking more specifically than you are. Rather than learning probability theory to understand ML, learn only what you determine to be necessary for what ML applications you are interested in. The concept maps I use are very specific, and they avoid the weak connection problem you mention. (It’s worth noting that I develop these as an autodidact, so I don’t have to take an entire class to just get a few facts I’m interested in.)
It sounds like both you and estimator are actually both on the same page: estimator seems to be talking about the “prerequisite” in the sense of, “systematic prerequisite”, as in, people say that you should learn X before you learn Y. You seem to be talking about “prerequisite” in the sense that, “skill X is a necessary component of skill Y”
Both of you, however, seem to agree that you should ignore the stuff that is irrelevant to what you are actually trying to accomplish.
This is a good way to put it. I may not have been clear.
To use an example, I have a concept map about fluid dynamics that I used in a class I took on turbulence recently. There were a few concepts that I did not understand well at some point, and I wanted to figure out which ones. To be more specific, isotropic tensors are often used in turbulence theory and modeling, but I didn’t really understand how to construct isotropic tensors algebraically. It became pretty clear this was something I should learn given the number of links isotropic tensors had to other concepts.
On the other hand, if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired. You can learn techniques like e.g. matrix inversions as needed to implement the algorithms you’re learning, but if you don’t understand how those techniques work in their original context, they become very hard to debug or optimize. Similarly for e.g. cryptography and basic information theory.
That’s probably more the exception than the rule, though; I sense that the point of most prerequisites in a traditional science curriculum is less to provide skills to build on and more to build habits of rigorous thinking.
Read what is a matrix, how to add, multiply and invert them, what is a determinant and what is an eigenvector and that’s enough to get you started. There are many algorithms in ML where vectors/matrices are used mostly as a handy notation.
Yes, you will be unable to understand some parts of ML which substantially require linear algebra; yes, understanding ML without linear algebra is harder; yes, you need linear algebra for almost any kind of serious ML research—but it doesn’t mean that you have to spend a few years studying arcane math before you can open a ML textbook.
Who said anything about a few years? If you paid attention in high school, the linear algebra background you need is at most a few months’ worth of work. I was providing a single counterexample, not saying that the full prerequisite list (which, if memory serves, is most of a CS curriculum for your average ML class) is always necessary.
That depends on whether you’re doing research or purely applied stuff. For applied use, domain expertise trumps knowing the internal details of the algorithms which you usually just call as pre-build functions—as long as you understand what do they do and where the limits (and the traps) are.
Not many people can invert matrices by hand any more and that’s not a problem for a higher-level understanding of linear algebra. Similarly, you don’t necessarily need to understand, say, how singular value decomposition works in order to do successful higher-level modeling of some domain.
I wasn’t pointing strictly to research, but I was pointing to low-level implementation. It now occurs to me that I might be unusual in this respect—much of my ML experience is in the context of a rather weird environment that didn’t have any existing libraries, leaving me to cut a lot of code myself.
So I might have to back off from “ability to do machine learning”. You can, in retrospect, use ML perfectly competently in a lot of settings even if the closest you’ve ever gotten to a simulated annealing algorithm is plugging the cost function into a Python library; but I have a hard time calling someone an expert if they’ve never written anything lower-level, just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
I have a feeling that’s a bit of a relic.
Long time ago programming environments were primitive and Real Men wrote their own sorts and hash tables (there is a canonical story from even more Ancient Times). But times have changed. I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch—similarly to how I can’t imagine needing to construct a washing machine out of a tub, an electric motor, pulleys, and belts.
I certainly still care about performance properties of various sorts, but I don’t care about their internal details as long as the performance properties hold. I suspect that the interview questions of the “implement a bubble sort on this piece of paper” variety if anything aim more at “have you been paying attention during your CS classes” and less at “do you have a deep understanding of what your program is doing”.
The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.
You can’t? I’ve had to do that several times. The usual scenario is that there are search/sort routines, but they have inconvenient properties—either they don’t perform well in the specific problem domain I’m dealing with (happens a lot in simulation; functions for efficiently doing certain types of categorization on spatially arranged data are rare outside graphics libraries), or they don’t work on the data types I need and a reduction is impractical for one reason or another, or they exist but can’t be used for legal reasons. Unless you always situate yourself in the most popular subfields, which I frankly find boring, you can’t always count on there being a library that does exactly what you want—all the more so in a still-emerging space like ML.
(I’ve never had to build a washing machine, incidentally, but I’ve had to fix washing machines—twice this year for two different machines, in fact. I could have hired a mechanic or bought a new machine, but either one would have cost me hundreds of dollars.)
Well, I was talking about standard sort routines—the ones where you have a vector of values and a comparator function. Now, search is quite a different beast altogether.
The thing is, most sorting is brute-force where you just sort without taking into account the specific structure of your data. That approach works well with sorting—but it doesn’t work well with search. The obvious problem is that we are interested in searching very large search spaces where brute force is nowhere close to practical. The salvation comes from the particular structure of the space which allows us to be much more efficient that brute-force, but the same structure forces us into custom solutions.
Because the structures of search spaces can be very different, there is a LOT of search algorithms and frequently enough you have to make bespoke versions of them to fit your particular problem. That’s entirely normal. Plus, of course, optimization is a subtype of search and customizing optimizers is also quite common.
Sure, so have I. In fact, I probably would be able to construct a washing machine out of a tub, an electric motor, and some parts. It will take a lot of time and will look ugly, but I think it’ll work. That doesn’t mean I’ll feel a need to do this :-)
Yes, this this this this this this this. “The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.” If I could upvote this multiple times, I would.
This is the crux of this entire approach. Learn the higher level, applied abstractions. And learn the very basic fundamentals. Forget learning how the lower-level gears turn: just learn the fundamental laws of physics. If you ever need to figure out a lower-level gear, you can just derive it from your knowledge of the fundamentals, combined with your big-picture knowledge of how that gear fits into the overall system.
That only works if there are few levels of abstraction; I doubt that you can derive how do programs work at the machine codes level based of your knowledge of physics and high-level programming. Sometimes, gears are so small that you can’t even see them on your top level big picture, and sometimes just climbing up one level of abstraction takes enormous effort if you don’t know in advance how to do it.
I think that you should understand, at least once, how the system works on each level and refresh/deepen that knowledge when you need it.
The definition of “fundamentals” differs though, depending on how abstract you get. The more layers of abstraction, the more abstract the fundamentals. If my goal is high-level programming, I don’t need to know how to write code on bare metal.
That’s why I advocate breaking things down until you reach the level of triviality for you personally. Most people will find, “writing a for-loop” to be trivial, without having to go farther down the rabbit hole. At a certain point, breaking things down too far actually makes things less trivial.
Can I give a counterexample? I think that way of learning things might help if you only need to apply the higher-level skills as you learned them, but if you need to develop or research those fields yourself, I’ve found you really do need the background.
As in, I have been bitten on the ass by my own choice not to double-major in mathematics in undergrad, thus resulting in my having to start climbing the towers of continuous probability and statistics/ML, abstract algebra, logic, real analysis, category theory, and topology in and after my MSc.
There’s a big difference between the fundamentals, and the low-level practical applications. I think the latter is what estimator is referring to. You can’t really make a breakthrough or do real research without a firm grasp of the fundamentals. But you definitely can make a breakthrough in, say, physics, without knowing the exact tensile strength of wood vs. steel. And yet, that type of “Applied Physics” was a pre-requisite at my school for the more advanced fields of physics that I was actually interested in.
Oh. Really? Dang.
You’re right; you have to learn solid background for research. But still, it often makes sense to learn in the reversed order.