Quick comment: I noticed that in all of your examples above, I chunk substantially bigger and fewer pieces. For example, in the “15 different bold bits” clip, I chunk it into about 8 pieces instead.
This is likely experience/background dependent; I happen to have a relatively strong background in ML and have read a stack of research papers recently, so I probably have both stronger noise filters and more complicated primitives available.
One possibly interesting side note: I never once, in any of your examples, considered metadata about the topic relevant. This includes things like the author names, “tested”, “study proposed”, etc. I suspect I’ve learned that 1) author names are almost never important, 2) test procedures are only worth thinking about if they’re very explicitly detailed (which was not the case above), and 3) even if the test procedures are ok, they’re typically only relevant as a cleanup/sanitization pass once the main concept is understood.
These are great points! The chunks I included are not personalized, so as you point out, they include information a skilled reader would filter out and use multiple chunks where a skilled reader might just see one.
Quick comment: I noticed that in all of your examples above, I chunk substantially bigger and fewer pieces. For example, in the “15 different bold bits” clip, I chunk it into about 8 pieces instead.
This is likely experience/background dependent; I happen to have a relatively strong background in ML and have read a stack of research papers recently, so I probably have both stronger noise filters and more complicated primitives available.
One possibly interesting side note: I never once, in any of your examples, considered metadata about the topic relevant. This includes things like the author names, “tested”, “study proposed”, etc. I suspect I’ve learned that 1) author names are almost never important, 2) test procedures are only worth thinking about if they’re very explicitly detailed (which was not the case above), and 3) even if the test procedures are ok, they’re typically only relevant as a cleanup/sanitization pass once the main concept is understood.
These are great points! The chunks I included are not personalized, so as you point out, they include information a skilled reader would filter out and use multiple chunks where a skilled reader might just see one.