Articles on such topics are notorious for their average bad quality. Reformulating in Bayesian terms, the prior probability of your statements being true is low, so you should provide some proofs or evidence—or why I (or anyone) should believe you? Have you actually checked if it works? Have you actually checked if it works for somebody else?
I don’t think that personal achievements are a bullet-proof argumentation for such an advice. Still, when I read something like this, I’m pretty sure that it contains valuable information, although it is probably a mistake to follow such advice verbatim anyway. So, if you have Hamming-level credentials, it will help.
As for your article, probably the only way to fix it is to add proofs to your statements. What evidence supports them? Is there any psychological research to back up your claims? Why do you think it is optimal (or near-optimal) way to learn skills?
This is a good self-help article. Can you see the reference list? :)
Articles on such topics are notorious for their average bad quality.
That’s interesting, I wasn’t aware of that reputation. That’s good to know and certainly justifies your skepticism.
All that said, I think one can still evaluate your point (and in my case, my Less Wrong post) based on its internal logic and how consistent it is with one’s own observations, without needing research to back it up. It would be easy enough to dismiss your own post for the very reasons you cited. Consider the following:
“In general, people new to a community are notoriously bad at gauging the pulse of said community. To reformulate in Bayesian terms, based on the length of time you’ve been posting here, the prior probability of your statement being true is low, so shouldn’t you provide some proofs or evidence—or why should I (or anyone) believe you?”
But to me, your logic checks out, and is fairly consistent with my own observations (that most self-help publications tend to be garbage), so that shifts the probabilities significantly in your favor. I’m hoping that people will evaluate my own post by similar criteria, rather than immediately dismissing it.
I’ve started commenting here recently, but I’m a long time lurker (>1 year). Also, I was speaking about self-help articles in general, not conditional on whether they are posted on LW—it makes sense, because pretty much anyone can post on LW.
Now I found a somewhat less extreme example of what I think is an OK post on self-help although it doesn’t have scientific references, because a) the author told us what actual results he achieved and, more importantly, b) the author explained why he thinks that the advice works in the first place.
Personally, I don’t find your post consistent with my observations, but it’s not my main objection—my main objection is that throwing an instruction without any justification is a bad practice, especially on such a controversial topic, especially in a rationalist community.
That’s totally fine, like I said, your post made sense and was consistent with what I’ve seen.
I still don’t really think that stating my qualifications would do much. In this context, it still just seems too much like bragging. “I helped build a multi-million dollar company, I compete in barbecue competitions and consistently place in the top 10% of the field and was sponsored by a major barbecue website, was ranked in the top 100 players in the world for a popular collectible card game, learned how to code with no formal education (and used that knowledge wrote a somewhat well-received calibration test, and also write a bunch of boring business platforms), wrote an article about a baseball statistic I co-developed and was published in a publication that’s important for people who care about baseball stats, learned how to be a carpenter, at one point was a licensed pharmacy technician, blah blah blah”
Even though I’m sure there’s a less crass way to phrase it, to me it still sounds exceedingly arrogant. I might be overreacting though. You tell me: if I prefaced my post with that, would you be more or less inclined to take me seriously?
I do like the idea of explaining why I think the advice works in the first place. I will start writing something up about that and append it to the original post.
I do like the idea of explaining why I think the advice works in the first place.
If I may suggest spending some space on explaining why do you think your experience generalizes—that is, why do you think that your methods will work for people who are not you.
I took your advice as well as estimator’s into account and added two paragraphs at the beginning to offer 1. Some research showing that many systems follow a distribution where a small portion of work accounts for a large portion of results, and 2. and explanation as to why it’s generalizable.
Also, I’d like to compare your system against common sense reasoning baseline. What do you think are the main differences between your approach and usual approaches to skill learning? What will be the difference in actions?
I’m asking that because that your guide contains quite long a list of recommendations/actions, while many of them are used (probably intuitively/implicitly) by almost any sensible person. Also, some of the recommendations clearly have more impact than others. So, what happens if we apply the Pareto principle to your learning system? Which 20% are the most important? What is at the core of your approach?
As I mentioned in another comment, the difference between this and the “common sense” approach is in what this system does not do.
As for what the 20% of this system that gives you the most bang for your buck? That’s a good question. Right now my “safe” answer is that it’s dependent on the type of skill you’re trying to learn. The trouble is that the common threads among all the skills (“Find the 20% of the skill that yields 80% of the results”) doesn’t have a lot of practical value. Like telling someone that all they need to do to lose weight is eat less and exercise more.
Let me think about it some more and I’ll get back to you.
So, after some cursory thought, naturally the part of the system that gives you the most bang for your buck are the first 4 steps. The last 3 steps are designed to help you improve, which is a much slower process than just learning the basics.
So, now to figure out how to recursively apply the the skill of learning a skill quickly to the skill “learning skills quickly”.
One piece of information you can use to determine what is most important is the number of other skills which require a certain skill as a prerequisite. Prerequisites should obviously be learned first, and it makes sense to learn them in order of how many doors they open. This is how I prioritize at the moment if I’m not considering subjective measures of “usefulness”.
For my learning goals, I’ve started making concept maps, partly as it helps me understand a subject by understanding how concepts are related, and partly to identify what to learn next as described above. It becomes fairly obvious that I should learn X if I want to learn Y and Z and X is a prerequisite for both.
In my experience, in math/science prerequisites often can (and should) be ignored, and learned as you actually need them. People who thoroughly follow all the prerequisites often end up bogged down in numerous science fields which have actually weak connection to what they wanted to learn initially, and then get demotivated and drop out of their endeavor. This is a common failure mode.
Like, you need probability theory to do machine learning, but some you are unlikely to encounter some parts of it, and also there are parts of ML which require very little of it. It totally makes sense to start with them.
I’m thinking more specifically than you are. Rather than learning probability theory to understand ML, learn only what you determine to be necessary for what ML applications you are interested in. The concept maps I use are very specific, and they avoid the weak connection problem you mention. (It’s worth noting that I develop these as an autodidact, so I don’t have to take an entire class to just get a few facts I’m interested in.)
It sounds like both you and estimator are actually both on the same page: estimator seems to be talking about the “prerequisite” in the sense of, “systematic prerequisite”, as in, people say that you should learn X before you learn Y. You seem to be talking about “prerequisite” in the sense that, “skill X is a necessary component of skill Y”
Both of you, however, seem to agree that you should ignore the stuff that is irrelevant to what you are actually trying to accomplish.
This is a good way to put it. I may not have been clear.
To use an example, I have a concept map about fluid dynamics that I used in a class I took on turbulence recently. There were a few concepts that I did not understand well at some point, and I wanted to figure out which ones. To be more specific, isotropic tensors are often used in turbulence theory and modeling, but I didn’t really understand how to construct isotropic tensors algebraically. It became pretty clear this was something I should learn given the number of links isotropic tensors had to other concepts.
On the other hand, if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired. You can learn techniques like e.g. matrix inversions as needed to implement the algorithms you’re learning, but if you don’t understand how those techniques work in their original context, they become very hard to debug or optimize. Similarly for e.g. cryptography and basic information theory.
That’s probably more the exception than the rule, though; I sense that the point of most prerequisites in a traditional science curriculum is less to provide skills to build on and more to build habits of rigorous thinking.
Read what is a matrix, how to add, multiply and invert them, what is a determinant and what is an eigenvector and that’s enough to get you started. There are many algorithms in ML where vectors/matrices are used mostly as a handy notation.
Yes, you will be unable to understand some parts of ML which substantially require linear algebra; yes, understanding ML without linear algebra is harder; yes, you need linear algebra for almost any kind of serious ML research—but it doesn’t mean that you have to spend a few years studying arcane math before you can open a ML textbook.
Who said anything about a few years? If you paid attention in high school, the linear algebra background you need is at most a few months’ worth of work. I was providing a single counterexample, not saying that the full prerequisite list (which, if memory serves, is most of a CS curriculum for your average ML class) is always necessary.
if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired
That depends on whether you’re doing research or purely applied stuff. For applied use, domain expertise trumps knowing the internal details of the algorithms which you usually just call as pre-build functions—as long as you understand what do they do and where the limits (and the traps) are.
Not many people can invert matrices by hand any more and that’s not a problem for a higher-level understanding of linear algebra. Similarly, you don’t necessarily need to understand, say, how singular value decomposition works in order to do successful higher-level modeling of some domain.
I wasn’t pointing strictly to research, but I was pointing to low-level implementation. It now occurs to me that I might be unusual in this respect—much of my ML experience is in the context of a rather weird environment that didn’t have any existing libraries, leaving me to cut a lot of code myself.
So I might have to back off from “ability to do machine learning”. You can, in retrospect, use ML perfectly competently in a lot of settings even if the closest you’ve ever gotten to a simulated annealing algorithm is plugging the cost function into a Python library; but I have a hard time calling someone an expert if they’ve never written anything lower-level, just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
I have a feeling that’s a bit of a relic.
Long time ago programming environments were primitive and Real Men wrote their own sorts and hash tables (there is a canonical story from even more Ancient Times). But times have changed. I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch—similarly to how I can’t imagine needing to construct a washing machine out of a tub, an electric motor, pulleys, and belts.
I certainly still care about performance properties of various sorts, but I don’t care about their internal details as long as the performance properties hold. I suspect that the interview questions of the “implement a bubble sort on this piece of paper” variety if anything aim more at “have you been paying attention during your CS classes” and less at “do you have a deep understanding of what your program is doing”.
The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.
I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch
You can’t? I’ve had to do that several times. The usual scenario is that there are search/sort routines, but they have inconvenient properties—either they don’t perform well in the specific problem domain I’m dealing with (happens a lot in simulation; functions for efficiently doing certain types of categorization on spatially arranged data are rare outside graphics libraries), or they don’t work on the data types I need and a reduction is impractical for one reason or another, or they exist but can’t be used for legal reasons. Unless you always situate yourself in the most popular subfields, which I frankly find boring, you can’t always count on there being a library that does exactly what you want—all the more so in a still-emerging space like ML.
(I’ve never had to build a washing machine, incidentally, but I’ve had to fix washing machines—twice this year for two different machines, in fact. I could have hired a mechanic or bought a new machine, but either one would have cost me hundreds of dollars.)
Well, I was talking about standard sort routines—the ones where you have a vector of values and a comparator function. Now, search is quite a different beast altogether.
The thing is, most sorting is brute-force where you just sort without taking into account the specific structure of your data. That approach works well with sorting—but it doesn’t work well with search. The obvious problem is that we are interested in searching very large search spaces where brute force is nowhere close to practical. The salvation comes from the particular structure of the space which allows us to be much more efficient that brute-force, but the same structure forces us into custom solutions.
Because the structures of search spaces can be very different, there is a LOT of search algorithms and frequently enough you have to make bespoke versions of them to fit your particular problem. That’s entirely normal. Plus, of course, optimization is a subtype of search and customizing optimizers is also quite common.
but I’ve had to fix washing machines
Sure, so have I. In fact, I probably would be able to construct a washing machine out of a tub, an electric motor, and some parts. It will take a lot of time and will look ugly, but I think it’ll work. That doesn’t mean I’ll feel a need to do this :-)
Yes, this this this this this this this. “The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.” If I could upvote this multiple times, I would.
This is the crux of this entire approach. Learn the higher level, applied abstractions. And learn the very basic fundamentals. Forget learning how the lower-level gears turn: just learn the fundamental laws of physics. If you ever need to figure out a lower-level gear, you can just derive it from your knowledge of the fundamentals, combined with your big-picture knowledge of how that gear fits into the overall system.
That only works if there are few levels of abstraction; I doubt that you can derive how do programs work at the machine codes level based of your knowledge of physics and high-level programming. Sometimes, gears are so small that you can’t even see them on your top level big picture, and sometimes just climbing up one level of abstraction takes enormous effort if you don’t know in advance how to do it.
I think that you should understand, at least once, how the system works on each level and refresh/deepen that knowledge when you need it.
The definition of “fundamentals” differs though, depending on how abstract you get. The more layers of abstraction, the more abstract the fundamentals. If my goal is high-level programming, I don’t need to know how to write code on bare metal.
That’s why I advocate breaking things down until you reach the level of triviality for you personally. Most people will find, “writing a for-loop” to be trivial, without having to go farther down the rabbit hole. At a certain point, breaking things down too far actually makes things less trivial.
Can I give a counterexample? I think that way of learning things might help if you only need to apply the higher-level skills as you learned them, but if you need to develop or research those fields yourself, I’ve found you really do need the background.
As in, I have been bitten on the ass by my own choice not to double-major in mathematics in undergrad, thus resulting in my having to start climbing the towers of continuous probability and statistics/ML, abstract algebra, logic, real analysis, category theory, and topology in and after my MSc.
There’s a big difference between the fundamentals, and the low-level practical applications. I think the latter is what estimator is referring to. You can’t really make a breakthrough or do real research without a firm grasp of the fundamentals. But you definitely can make a breakthrough in, say, physics, without knowing the exact tensile strength of wood vs. steel. And yet, that type of “Applied Physics” was a pre-requisite at my school for the more advanced fields of physics that I was actually interested in.
As for the actual content, I basically fail to see its area of applicability. For sufficiently complex skills, like say, math, languages or football decision-trees & howto-guides approach will likely fail as too shallow; for isolated skills like changing a tire complex learning approaches are an overkill—just google it and follow the instructions. Can you elaborate languages example further? Because, you know, learning a bunch of phrases from phrasebook to be able to say a few words in a foreign country is a non-issue. Actually learning language is. How would you apply your system to achieve intermediate-level language knowledge? Any other non-trivial skills learning example would also suffice. What skills have you trained by using your learning system, and how?
Also, when you say “intermediate level language knowledge”, what exactly do you mean? One of the key steps is defining exactly what you want to accomplish and why. I don’t want to create a whole write-up, only to realize that you and I have two different definitions of “intermediate level language knowledge”.
So if you’d tell me the “what” and the “why”, I’ll do the rest.
… take part in routine conversations; write & understand simple written text; make notes & understand most of the general meaning of lectures, meetings, TV programmes and extract basic information from a written document.
I’ll give a more in depth breakdown soon but for now, I’d probably take a similar approach that I took to learning to read Japanese : learn basic sentence structure, learn top 150ish vocabulary words, avoid books written in non-romaji. Practice hearing spoken word by listening to speeches and following their transcriptions. My exception protocol for unrecognized words was to look them up. And for irregular sentence structure, to guess based on context. It worked for watching movies and reading, mostly but as you can tell, yoi kakikomu koto ga dekimasen*. I’d have to do some thinking on the writing part, it would most likely involve sticking to simple sentences.
*thats terrible Japanese for “I cannot write well”. I think. I hope.
Well of course they do. Because these things are necessary to learning a language. This is the 20% that’s most efficient. By definition someone who puts in 100% of the effort will be doing what I did.
The efficiency of this approach revolves around what you don’t do. You’re excising the 80%. I didn’t spend long hours learning katakana, hiragana and kanji. I didn’t learn the more complex tenses and conjugations. I didn’t spend time on vocabulary words that are highly situational. Contrast this to a typical Japanese textbook.
There seem to be two major approaches to learning language.
One is to go a language school / course where the teachers, in my experience, teach it like an academic discipline + the usual guess-my-password bullshit, so you get tested and graded on things like grammar, like a test where you need to fill in conjugations / declinations into holes in a text. (Obviously I am talking about languages that have those kinds of things, like Germanic or Romance ones). Case in point: part of my B2 level German exam at the University of Vienna was exactly that kind of hole-filling and it felt really wrong as it has not much to do with commuication, it is a more academic approach.
The other approach is to do something like this for a while, but when you get to that basic point where you can say “Jack would have ordered a beer yesterday if he had money on him” ditch it and pretty much learn from immersion. Screw grammar, just read a lot of books, figure out words from the context, and conduct imaginary or real conversations no matter how bad the grammar is. Real people prefer to communicate with people who talk fast, not correct. Talking with someone saying at a normal speed who is talking like “me no want buy house, me want rent house now” is far better than someone who is like “I no… (long pause) do not? want … (long pause) want to? buy a house, rather… (long pause)… instead? I want to rent it… (long pause) rent one”. I used to be that second guy in 2 languages and it sucked.
(Now of course you may think “but everybody knows immersion is better it is not even new” yeah apparently that everybody does not include the huge European language school chains like Berlitz and their who knows how many students… )
Process How-To: Googled “how to layup”, “how to shoot a 3-pointer”, and “how to steal a ball”
3a. Process Failure Points: Missing a shot, getting the ball stolen, missing a pass.
3b. Process Difficulties: Anything involving ball handling or dribbling. Defense.
Exception Protocol: On offense: Pass the ball to a better player than myself, or set a pick. On defense: play very close to my opponent.
5a. Avoid anything involving dribbling but not scoring.
5b. Prepare and practice two-point shots.
5c. Focus on getting open for a 3-point shot. Practice consistently shooting from 3-point line.
Get better by playing.
I would say basketball is fairly complex. One thing I didn’t mention in the original post (mainly because it starts to get into the “how do individual people learn”) but for me, I don’t get good at a competitive skill by competing against people who also suck. By getting good enough to be able to play with people who are actually good, it made it easier for me to learn the advanced part of the game faster.
Also, this post has a list of (at least what I think to be) fairly non-trivial skills that I have trained using this method.
Articles on such topics are notorious for their average bad quality. Reformulating in Bayesian terms, the prior probability of your statements being true is low, so you should provide some proofs or evidence—or why I (or anyone) should believe you? Have you actually checked if it works? Have you actually checked if it works for somebody else?
I don’t think that personal achievements are a bullet-proof argumentation for such an advice. Still, when I read something like this, I’m pretty sure that it contains valuable information, although it is probably a mistake to follow such advice verbatim anyway. So, if you have Hamming-level credentials, it will help.
As for your article, probably the only way to fix it is to add proofs to your statements. What evidence supports them? Is there any psychological research to back up your claims? Why do you think it is optimal (or near-optimal) way to learn skills?
This is a good self-help article. Can you see the reference list? :)
That’s interesting, I wasn’t aware of that reputation. That’s good to know and certainly justifies your skepticism.
All that said, I think one can still evaluate your point (and in my case, my Less Wrong post) based on its internal logic and how consistent it is with one’s own observations, without needing research to back it up. It would be easy enough to dismiss your own post for the very reasons you cited. Consider the following:
“In general, people new to a community are notoriously bad at gauging the pulse of said community. To reformulate in Bayesian terms, based on the length of time you’ve been posting here, the prior probability of your statement being true is low, so shouldn’t you provide some proofs or evidence—or why should I (or anyone) believe you?”
But to me, your logic checks out, and is fairly consistent with my own observations (that most self-help publications tend to be garbage), so that shifts the probabilities significantly in your favor. I’m hoping that people will evaluate my own post by similar criteria, rather than immediately dismissing it.
I’ve started commenting here recently, but I’m a long time lurker (>1 year). Also, I was speaking about self-help articles in general, not conditional on whether they are posted on LW—it makes sense, because pretty much anyone can post on LW.
Now I found a somewhat less extreme example of what I think is an OK post on self-help although it doesn’t have scientific references, because a) the author told us what actual results he achieved and, more importantly, b) the author explained why he thinks that the advice works in the first place.
Personally, I don’t find your post consistent with my observations, but it’s not my main objection—my main objection is that throwing an instruction without any justification is a bad practice, especially on such a controversial topic, especially in a rationalist community.
That’s totally fine, like I said, your post made sense and was consistent with what I’ve seen.
I still don’t really think that stating my qualifications would do much. In this context, it still just seems too much like bragging. “I helped build a multi-million dollar company, I compete in barbecue competitions and consistently place in the top 10% of the field and was sponsored by a major barbecue website, was ranked in the top 100 players in the world for a popular collectible card game, learned how to code with no formal education (and used that knowledge wrote a somewhat well-received calibration test, and also write a bunch of boring business platforms), wrote an article about a baseball statistic I co-developed and was published in a publication that’s important for people who care about baseball stats, learned how to be a carpenter, at one point was a licensed pharmacy technician, blah blah blah”
Even though I’m sure there’s a less crass way to phrase it, to me it still sounds exceedingly arrogant. I might be overreacting though. You tell me: if I prefaced my post with that, would you be more or less inclined to take me seriously?
I do like the idea of explaining why I think the advice works in the first place. I will start writing something up about that and append it to the original post.
If I may suggest spending some space on explaining why do you think your experience generalizes—that is, why do you think that your methods will work for people who are not you.
I took your advice as well as estimator’s into account and added two paragraphs at the beginning to offer 1. Some research showing that many systems follow a distribution where a small portion of work accounts for a large portion of results, and 2. and explanation as to why it’s generalizable.
Also, I’d like to compare your system against common sense reasoning baseline. What do you think are the main differences between your approach and usual approaches to skill learning? What will be the difference in actions?
I’m asking that because that your guide contains quite long a list of recommendations/actions, while many of them are used (probably intuitively/implicitly) by almost any sensible person. Also, some of the recommendations clearly have more impact than others. So, what happens if we apply the Pareto principle to your learning system? Which 20% are the most important? What is at the core of your approach?
As I mentioned in another comment, the difference between this and the “common sense” approach is in what this system does not do.
As for what the 20% of this system that gives you the most bang for your buck? That’s a good question. Right now my “safe” answer is that it’s dependent on the type of skill you’re trying to learn. The trouble is that the common threads among all the skills (“Find the 20% of the skill that yields 80% of the results”) doesn’t have a lot of practical value. Like telling someone that all they need to do to lose weight is eat less and exercise more.
Let me think about it some more and I’ll get back to you.
So, after some cursory thought, naturally the part of the system that gives you the most bang for your buck are the first 4 steps. The last 3 steps are designed to help you improve, which is a much slower process than just learning the basics.
So, now to figure out how to recursively apply the the skill of learning a skill quickly to the skill “learning skills quickly”.
Okay, so I made a significant revision of the post. The original ideas are all there, just written in a much less obtuse manner.
A much more logical argument is presented at the beginning, along with constraints.
“Archetypes” and “Processes” have been replaced by sub-skills and trivial sub-skills.
The lengthy discourse on strategy has been replaced by simply sorting your list of trivial sub-skills, which accomplishes the same effect.
The “improvement” has been streamlined greatly.
Meta-analysis has been removed because it’s really a separate subject.
One piece of information you can use to determine what is most important is the number of other skills which require a certain skill as a prerequisite. Prerequisites should obviously be learned first, and it makes sense to learn them in order of how many doors they open. This is how I prioritize at the moment if I’m not considering subjective measures of “usefulness”.
For my learning goals, I’ve started making concept maps, partly as it helps me understand a subject by understanding how concepts are related, and partly to identify what to learn next as described above. It becomes fairly obvious that I should learn X if I want to learn Y and Z and X is a prerequisite for both.
In my experience, in math/science prerequisites often can (and should) be ignored, and learned as you actually need them. People who thoroughly follow all the prerequisites often end up bogged down in numerous science fields which have actually weak connection to what they wanted to learn initially, and then get demotivated and drop out of their endeavor. This is a common failure mode.
Like, you need probability theory to do machine learning, but some you are unlikely to encounter some parts of it, and also there are parts of ML which require very little of it. It totally makes sense to start with them.
I’m thinking more specifically than you are. Rather than learning probability theory to understand ML, learn only what you determine to be necessary for what ML applications you are interested in. The concept maps I use are very specific, and they avoid the weak connection problem you mention. (It’s worth noting that I develop these as an autodidact, so I don’t have to take an entire class to just get a few facts I’m interested in.)
It sounds like both you and estimator are actually both on the same page: estimator seems to be talking about the “prerequisite” in the sense of, “systematic prerequisite”, as in, people say that you should learn X before you learn Y. You seem to be talking about “prerequisite” in the sense that, “skill X is a necessary component of skill Y”
Both of you, however, seem to agree that you should ignore the stuff that is irrelevant to what you are actually trying to accomplish.
This is a good way to put it. I may not have been clear.
To use an example, I have a concept map about fluid dynamics that I used in a class I took on turbulence recently. There were a few concepts that I did not understand well at some point, and I wanted to figure out which ones. To be more specific, isotropic tensors are often used in turbulence theory and modeling, but I didn’t really understand how to construct isotropic tensors algebraically. It became pretty clear this was something I should learn given the number of links isotropic tensors had to other concepts.
On the other hand, if you don’t have a solid grasp of linear algebra, your ability to do most types of machine learning is seriously impaired. You can learn techniques like e.g. matrix inversions as needed to implement the algorithms you’re learning, but if you don’t understand how those techniques work in their original context, they become very hard to debug or optimize. Similarly for e.g. cryptography and basic information theory.
That’s probably more the exception than the rule, though; I sense that the point of most prerequisites in a traditional science curriculum is less to provide skills to build on and more to build habits of rigorous thinking.
Read what is a matrix, how to add, multiply and invert them, what is a determinant and what is an eigenvector and that’s enough to get you started. There are many algorithms in ML where vectors/matrices are used mostly as a handy notation.
Yes, you will be unable to understand some parts of ML which substantially require linear algebra; yes, understanding ML without linear algebra is harder; yes, you need linear algebra for almost any kind of serious ML research—but it doesn’t mean that you have to spend a few years studying arcane math before you can open a ML textbook.
Who said anything about a few years? If you paid attention in high school, the linear algebra background you need is at most a few months’ worth of work. I was providing a single counterexample, not saying that the full prerequisite list (which, if memory serves, is most of a CS curriculum for your average ML class) is always necessary.
That depends on whether you’re doing research or purely applied stuff. For applied use, domain expertise trumps knowing the internal details of the algorithms which you usually just call as pre-build functions—as long as you understand what do they do and where the limits (and the traps) are.
Not many people can invert matrices by hand any more and that’s not a problem for a higher-level understanding of linear algebra. Similarly, you don’t necessarily need to understand, say, how singular value decomposition works in order to do successful higher-level modeling of some domain.
I wasn’t pointing strictly to research, but I was pointing to low-level implementation. It now occurs to me that I might be unusual in this respect—much of my ML experience is in the context of a rather weird environment that didn’t have any existing libraries, leaving me to cut a lot of code myself.
So I might have to back off from “ability to do machine learning”. You can, in retrospect, use ML perfectly competently in a lot of settings even if the closest you’ve ever gotten to a simulated annealing algorithm is plugging the cost function into a Python library; but I have a hard time calling someone an expert if they’ve never written anything lower-level, just as I’d expect a competent software engineer to be able to write a hash table by hand even if every environment they’re likely to encounter will have built-in implementations or at least efficient libraries for it.
I have a feeling that’s a bit of a relic.
Long time ago programming environments were primitive and Real Men wrote their own sorts and hash tables (there is a canonical story from even more Ancient Times). But times have changed. I can’t imagine a serious situation (as opposed to e.g. a programming contest) where I would have to write my own sort routine from scratch—similarly to how I can’t imagine needing to construct a washing machine out of a tub, an electric motor, pulleys, and belts.
I certainly still care about performance properties of various sorts, but I don’t care about their internal details as long as the performance properties hold. I suspect that the interview questions of the “implement a bubble sort on this piece of paper” variety if anything aim more at “have you been paying attention during your CS classes” and less at “do you have a deep understanding of what your program is doing”.
The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.
You can’t? I’ve had to do that several times. The usual scenario is that there are search/sort routines, but they have inconvenient properties—either they don’t perform well in the specific problem domain I’m dealing with (happens a lot in simulation; functions for efficiently doing certain types of categorization on spatially arranged data are rare outside graphics libraries), or they don’t work on the data types I need and a reduction is impractical for one reason or another, or they exist but can’t be used for legal reasons. Unless you always situate yourself in the most popular subfields, which I frankly find boring, you can’t always count on there being a library that does exactly what you want—all the more so in a still-emerging space like ML.
(I’ve never had to build a washing machine, incidentally, but I’ve had to fix washing machines—twice this year for two different machines, in fact. I could have hired a mechanic or bought a new machine, but either one would have cost me hundreds of dollars.)
Well, I was talking about standard sort routines—the ones where you have a vector of values and a comparator function. Now, search is quite a different beast altogether.
The thing is, most sorting is brute-force where you just sort without taking into account the specific structure of your data. That approach works well with sorting—but it doesn’t work well with search. The obvious problem is that we are interested in searching very large search spaces where brute force is nowhere close to practical. The salvation comes from the particular structure of the space which allows us to be much more efficient that brute-force, but the same structure forces us into custom solutions.
Because the structures of search spaces can be very different, there is a LOT of search algorithms and frequently enough you have to make bespoke versions of them to fit your particular problem. That’s entirely normal. Plus, of course, optimization is a subtype of search and customizing optimizers is also quite common.
Sure, so have I. In fact, I probably would be able to construct a washing machine out of a tub, an electric motor, and some parts. It will take a lot of time and will look ugly, but I think it’ll work. That doesn’t mean I’ll feel a need to do this :-)
Yes, this this this this this this this. “The capacity of human minds is limited and I’ll accept climbing up higher in abstraction levels at the price of forgetting how the lower-level gears turn.” If I could upvote this multiple times, I would.
This is the crux of this entire approach. Learn the higher level, applied abstractions. And learn the very basic fundamentals. Forget learning how the lower-level gears turn: just learn the fundamental laws of physics. If you ever need to figure out a lower-level gear, you can just derive it from your knowledge of the fundamentals, combined with your big-picture knowledge of how that gear fits into the overall system.
That only works if there are few levels of abstraction; I doubt that you can derive how do programs work at the machine codes level based of your knowledge of physics and high-level programming. Sometimes, gears are so small that you can’t even see them on your top level big picture, and sometimes just climbing up one level of abstraction takes enormous effort if you don’t know in advance how to do it.
I think that you should understand, at least once, how the system works on each level and refresh/deepen that knowledge when you need it.
The definition of “fundamentals” differs though, depending on how abstract you get. The more layers of abstraction, the more abstract the fundamentals. If my goal is high-level programming, I don’t need to know how to write code on bare metal.
That’s why I advocate breaking things down until you reach the level of triviality for you personally. Most people will find, “writing a for-loop” to be trivial, without having to go farther down the rabbit hole. At a certain point, breaking things down too far actually makes things less trivial.
Can I give a counterexample? I think that way of learning things might help if you only need to apply the higher-level skills as you learned them, but if you need to develop or research those fields yourself, I’ve found you really do need the background.
As in, I have been bitten on the ass by my own choice not to double-major in mathematics in undergrad, thus resulting in my having to start climbing the towers of continuous probability and statistics/ML, abstract algebra, logic, real analysis, category theory, and topology in and after my MSc.
There’s a big difference between the fundamentals, and the low-level practical applications. I think the latter is what estimator is referring to. You can’t really make a breakthrough or do real research without a firm grasp of the fundamentals. But you definitely can make a breakthrough in, say, physics, without knowing the exact tensile strength of wood vs. steel. And yet, that type of “Applied Physics” was a pre-requisite at my school for the more advanced fields of physics that I was actually interested in.
Oh. Really? Dang.
You’re right; you have to learn solid background for research. But still, it often makes sense to learn in the reversed order.
Nice, but beware reasoning after you’ve written the bottom line.
As for the actual content, I basically fail to see its area of applicability. For sufficiently complex skills, like say, math, languages or football decision-trees & howto-guides approach will likely fail as too shallow; for isolated skills like changing a tire complex learning approaches are an overkill—just google it and follow the instructions. Can you elaborate languages example further? Because, you know, learning a bunch of phrases from phrasebook to be able to say a few words in a foreign country is a non-issue. Actually learning language is. How would you apply your system to achieve intermediate-level language knowledge? Any other non-trivial skills learning example would also suffice. What skills have you trained by using your learning system, and how?
Also, when you say “intermediate level language knowledge”, what exactly do you mean? One of the key steps is defining exactly what you want to accomplish and why. I don’t want to create a whole write-up, only to realize that you and I have two different definitions of “intermediate level language knowledge”.
So if you’d tell me the “what” and the “why”, I’ll do the rest.
I meant something like this.
I’ll give a more in depth breakdown soon but for now, I’d probably take a similar approach that I took to learning to read Japanese : learn basic sentence structure, learn top 150ish vocabulary words, avoid books written in non-romaji. Practice hearing spoken word by listening to speeches and following their transcriptions. My exception protocol for unrecognized words was to look them up. And for irregular sentence structure, to guess based on context. It worked for watching movies and reading, mostly but as you can tell, yoi kakikomu koto ga dekimasen*. I’d have to do some thinking on the writing part, it would most likely involve sticking to simple sentences.
*thats terrible Japanese for “I cannot write well”. I think. I hope.
But these are the things pretty much everybody does while learning languages.
Well of course they do. Because these things are necessary to learning a language. This is the 20% that’s most efficient. By definition someone who puts in 100% of the effort will be doing what I did.
The efficiency of this approach revolves around what you don’t do. You’re excising the 80%. I didn’t spend long hours learning katakana, hiragana and kanji. I didn’t learn the more complex tenses and conjugations. I didn’t spend time on vocabulary words that are highly situational. Contrast this to a typical Japanese textbook.
There seem to be two major approaches to learning language.
One is to go a language school / course where the teachers, in my experience, teach it like an academic discipline + the usual guess-my-password bullshit, so you get tested and graded on things like grammar, like a test where you need to fill in conjugations / declinations into holes in a text. (Obviously I am talking about languages that have those kinds of things, like Germanic or Romance ones). Case in point: part of my B2 level German exam at the University of Vienna was exactly that kind of hole-filling and it felt really wrong as it has not much to do with commuication, it is a more academic approach.
The other approach is to do something like this for a while, but when you get to that basic point where you can say “Jack would have ordered a beer yesterday if he had money on him” ditch it and pretty much learn from immersion. Screw grammar, just read a lot of books, figure out words from the context, and conduct imaginary or real conversations no matter how bad the grammar is. Real people prefer to communicate with people who talk fast, not correct. Talking with someone saying at a normal speed who is talking like “me no want buy house, me want rent house now” is far better than someone who is like “I no… (long pause) do not? want … (long pause) want to? buy a house, rather… (long pause)… instead? I want to rent it… (long pause) rent one”. I used to be that second guy in 2 languages and it sucked.
(Now of course you may think “but everybody knows immersion is better it is not even new” yeah apparently that everybody does not include the huge European language school chains like Berlitz and their who knows how many students… )
Basketball is an example. I’m about to head home so I’ll do the ultra-abbreviated TL;DR version:
Goals: Score points, prevent opponent from scoring points.
Archetypes: Offense (2-point), Offense (3-point), Defense
Process How-To: Googled “how to layup”, “how to shoot a 3-pointer”, and “how to steal a ball” 3a. Process Failure Points: Missing a shot, getting the ball stolen, missing a pass. 3b. Process Difficulties: Anything involving ball handling or dribbling. Defense.
Exception Protocol: On offense: Pass the ball to a better player than myself, or set a pick. On defense: play very close to my opponent. 5a. Avoid anything involving dribbling but not scoring. 5b. Prepare and practice two-point shots. 5c. Focus on getting open for a 3-point shot. Practice consistently shooting from 3-point line.
Get better by playing.
I would say basketball is fairly complex. One thing I didn’t mention in the original post (mainly because it starts to get into the “how do individual people learn”) but for me, I don’t get good at a competitive skill by competing against people who also suck. By getting good enough to be able to play with people who are actually good, it made it easier for me to learn the advanced part of the game faster.
Also, this post has a list of (at least what I think to be) fairly non-trivial skills that I have trained using this method.