NaiveTortoise’s Short Form Feed
In light of reading Hazard’s Shortform Feed—which I really enjoy—based on Raemon’s Shortform feed, I’m making my own. There be thoughts here. Hopefully, this will also get me posting more.
In light of reading Hazard’s Shortform Feed—which I really enjoy—based on Raemon’s Shortform feed, I’m making my own. There be thoughts here. Hopefully, this will also get me posting more.
Cruxes I Have With Many LW Readers
There’s a crux I seem to have with a lot of LWers that I’ve struggled to put my finger on for a long time but I think reduces to some combination of:
faith in elegance vs. expectation of messiness;
preference for axioms vs. examples;
identification as primarily a scientist/truth-seeker vs. as an engineer/builder.
I tend to be more inclined towards the latter in each case, whereas I think a lot of LWers are inclined towards the former, with the potential exception of the author of realism about rationality, who seems to have opinions that overlap with many of my own. While I still feel uncomfortable with the above binaries, I’ve now gathered enough examples to at least list them as evidence for what I’m talking about.
Example 1: Linear Algebra Textbooks
A few LWers have positively reviewed Linear Algebra Done Right (LADR), in particular complimenting it for revealing the inner workings of Linear Algebra. I too recently read most of this book and did a lot of the exercises. And… I liked it but seemingly less than the other reviewers. In particular, I enjoyed getting a lot of practice reading definition-theorem-proof style math and doing lots of proofs myself, but found myself wishing for more examples and discussion of how to compute things like eigenvalues in practice. While I know that’s not what the book’s about, the difference I’m pointing to is more that I found the omission of these things bothersome, whereas I suspect the other reviewers were happy with the focus on constructing the different objects mathematically (I’m also obviously making some assumptions here).
On the other hand, I’ve recently been reading sections of Shilov’s Linear Algebra, which is more concrete but does more ugly stuff like present the determinant very early on, and I feel like I’m learning better from it.
I think one contributing factor towards this preference difference is that I tend to be more OK with unmotivated messiness if the messy thing is clearly useful for something but less OK slogging through a bunch of elegant but not-clear-what-it’s-used-for build up. Another way to put this would be that I tend to like to get top-down view of a subject and then go depth-first afterwards, whereas others seem happy to learn bottom-up. I used to think this was because of my experience with programming where algorithms are pretty much always presented in
terms of their purpose and tend to be become messier as they get optimized for performance. I still like knowing the motivation for things, but I also accept that stuff that works for real applications often has a bunch of messiness. On the other hand, a lot of LWers are also programmers who are only now going deep on math and they seem to still be happy with the axiomatic math way of doing things. So having a programming background doesn’t seem to correlate with my preferences that strongly...
What would be great would be if someone would chime in providing better hypotheses/explanations than the one I’ve given.
Example 2: Scientists vs. Engineers as Role Models
Much of early LW content, the Sequences in particular, used scientists like Einstein and Feynman as role models in discussions (and also targets of criticism in fairness). While I love Feynman and Einstein too, I tend to also revere builders/engineers, such as John Carmack, Jeff Dean, and Konrad Zuse, but these types of people don’t seem to get nearly as much praise on LW.
One explanation for this is that great but not necessarily thoughtful engineers can drive X-risk through their work. For example, here’s a discussion where a few folks argue that AGI requires insight more than programming ability and explicitly mention needing Judea Pearl more than John Carmack. While this is a fair argument, I’m skeptical that it’s the true rejection. Security mindset seems to be as common among engineers as it is among scientists given that most of the folks who participate in things like DefCon and work in computer security tend to be hardcore engineer types like Trammell Hudson. (In his original essay, Eliezer cites Bruce Schneier, definitely an engineer, as someone he trusts to have security mindset.)
Another potential explanation for this is that LW readers tend to like doing and learning about science (pure math included) more than doing engineering. It’s plausible that people who were attracted to early LW/OB content and were compelled by arguments for X-risk tend to also prefer science to engineering.
Conclusion
Unfortunately, I don’t have some sort of nice insight to conclude this with. I don’t think the differences between my and other LWers preferences are bad so much as an implicit thing that doesn’t get discussed.
I am curious whether my dichotomies seem reasonably accurate to anyone reading this? And if so, do my hypotheses for them seem reasonable?
I have similar differences with many people on LW and agree there is something of an unacknowledged aesthetic here.
I think the engineer mindset is more strongly represented here than you think, but that the nature of nonspecialist online discussion warps things away from the engineer mindset and towards the scientist mindset. Both types of people are present, but the engineer-mindset people tend not to put that part of themselves forward here.
The problem with getting down into the details is that there are many areas with messy details to get into, and it’s hard to appreciate the messy details of an area you haven’t spent enough time in. So deep dives in narrow topics wind up looking more like engineer-mindset, while shallow passes over wide areas wind up looking more like scientist-mindset. LessWrong posts can’t assume much background, which limits their depth.
I would be happy to see more deep-dives; a lightly edited transcript of John Carmack wouldn’t be a prototypical LessWrong post, but it would be a good one. But such posts are necessarily going to exclude a lot of readers, and LessWrong isn’t necessarily going to be competitive with posting in more topic-specialized places.
These are all good points.
After I saw that Benito did a transcript post, I considered doing one for one of Carmack’s talks or a recent interview of Yann LeCunn I found pretty interesting (based on the talks of his I’ve listened to, LeCunn has a pretty engineering-y mindset even though he’s nominally a scientist). Not going to happen immediately though since it requires a pretty big time investment.
Alternatively, maybe I’ll review Masters of Doom, which is where I learned most of what I know about Carmack.
As the dichotomy isn’t jumping out at me, I guess I should read both of those books* sometime and see which I like more.
*Linear Algebra Done Right (LADR)
Shilov’s Linear Algebra
This is really interesting, I’m glad you wrote this up. I think there’s something to it.
Some quick comments:
I generally expect there to exist simple underlying principles in most domains which give rise to messiness (and often the messiness seems a bit less messy once you understand them). Perceiving “messiness” does also often feel to me like lack of understanding whereas seeing the underlying unity makes me feel like I get whatever the subject matter is.
I think I would like it if LessWrong had more engineers/inventors as role models and that it’s something of an oversight that we don’t. Yet I also feel like John Carmack probably probably isn’t remotely near the level of Pearl (I’m not that familiar Carmack’s work): pushing forward video game development doesn’t compare to neatly figuring what exactly causality itself is.
There might be something like all truly monumental engineering breakthroughs depended on something like a “scientific” breakthrough. Something like Faraday and Maxwell figuring out theories of electromagnetism is actually a bigger deal than Edison(/others) figuring out the lightbulb, the radio, etc. There are cases of lauded people who are a little more ambiguous on the science/engineer dichotomy. Turing? Shannon? Tesla? Shockley et al with the transistor seems kind of like an engineering breakthrough, and seems there could be love for that. I wonder if Feynman gets more recognition because as an educator we got a lot more of the philosophy underlying his work. Just rambling here.
A little on my background: I did an EE degree which was very practical focus. My experience is that I was taught how to do apply a lot of equations and make things in the lab, but most courses skimped on providing the real understanding that left me overall worse as an engineer. The math majors actually understood Linear Algebra, the physicists actually understood electromagnetism, and I knew enough to make some neat things in the lab and pass tests, but I was worse off for not having a deeper “theoretical” understanding. So I feel like I developed more of an identity as a engineer, but came to feel that to be a really great engineer I needed to get the core science better*.
*I have some recollection that Tesla could develop a superior AC electric system because he understood the underlying math better than Edison, but this is a vague recollection.
You’re looking at the wrong thing. Don’t look at the topic of their work; look at their cognitive style and overall generativity. Carmack is many levels above Pearl. Just as importantly, there’s enough recorded video of him speaking unscripted that it’s feasible to absorb some of his style.
By generativity do you mean “within-domain” generativity?
To unpack which “levels” I was grading on, it’s something like a blend of “importance and significance of their work” / “difficulty of the problems they were solving”, admittedly that’s still pretty vague. On those dimensions, it seems entirely fair to compare across topics and assert that Pearl was solving more significant and more difficult problem(s) than Carmack. And for that “style” isn’t especially relevant. (This can also be true even if Carmack solved many more problems.)
But I’m curious about your angle—when you say that Carmack is many levels above Pearl, which specific dimensions is that on (generativity and style?) and do you have any examples/links for those?
Not exactly, because Carmack has worked in more than one domain (albeit not as successfully; Armadillo Aerospace never made orbit.)
Agree on significance, disagree on difficulty.
In an interesting turn of events, John Carmack announced today that he’ll be pivoting to work on AGI.
TRIZ is an engineering discipline that has something called the five levels of innovation, which talks about this:
1. You solve a problem by using a common solution in your own speciality.
2. You solve a problem using a common solution i your own industry.
3. You solve a problem using a common solution found in other industries.
4. You solve a problem using a solution built on first principles (e.g. little known scientific principles.)
5. You solve a problem by discovering a new principle/scientific rule.
Seems you’re referring to this https://en.wikipedia.org/wiki/TRIZ?
Yes.
Thanks for your reply! I agree with a lot of what you said.
First off, thanks for bringing up the point about underlying principles. I agree that there are often underlying principles in many domains and that I also really like to find unity in seeming messiness. I used to be of the more extreme view that principles were in some sense more important than the details, but I’ve become more skeptical over time for two reasons.
From a pedagogy perspective, I’ve personally never had much luck learning principles without having a strong base of practice & knowledge. That said, when I have that base, learning principles helps me improve further and is satisfying.
I’ve realized over time how much of action (where action can include thinking) is based upon a set of non-verbal strategies that one learns through practice and experimentation even in seemingly theoretical domains. These strategies seem to be the secret sauce that allow one to act fluently but seem meaningfully different from the types of principles people often discuss.
Another way to phrase my argument is that principles are important but very hard to transfer between minds. It’s possible you agree and I’m just belaboring the point but I wanted to make it explicit.
One concrete example of the distinction I’m drawing is something called the “What Are Monads Fallacy” in the Haskell community where people try to explain monads by conveying their understanding of what mondas really are even though they learned about monads by just using them a bunch which lead to them later developing a higher level understanding of them. This reflects a more general problem where experts often struggle to teach to novices because they don’t realize that their broad understanding is actually founded upon lower level understanding of a lot of details.
I tentatively agree, but it’s pretty hard to draw comparisons. From an insight perspective, I agree that Pearl’s work on Bayes Nets and Causality were probably more profound that anything Carmack came up with. From an economic perspective though, Carmack had a massive, albeit indirect, impact on the trajectory of the computing world. By coming up with new algorithms and techniques for 3D game rendering at a time when people had basically no idea how to render 3D games in realtime, Carmack drove the gaming industry forward, which certainly contributed to development of better GPUs and processors as well. Carmack was also the person at Id who insisted on making their games moddable and releasing their game engines, which eventually lead to the development of games like Half-Life.
That said, a better point of comparison to Pearl is probably Jeff Dean, who, in close collaboration with Sanjay Ghemawat, first wrote much of Google’s search stack from scratch after it starting failing to scale and then subsequently invented BigTable, MapReduce, Spanner, and Tensorflow!
Agree that science tends to be upstream of later technology developments, but I would emphasize that there are probably cases where without great engineers, the actual applications never get built. For example, there was a large gap between us understanding genes fairly well and being able to sequence and, more recently, synthesize them.
I agree with this and would add Lynn Conway, who invented VLSI, one of the key enablers of the modern processor industry and Moore’s Law.
To be clear, I shared this frustration with the engineering curriculum. I started as a Computer Engineering major and switched to CS because I felt like engineering was just a bag of unmotivated tricks whereas in CS you could understand why things were the way they were. However, part of the reason I liked CS’s theory was because it was presented in the context of understanding algorithms.
As a final point, I don’t think I did a good job of my original post of emphasizing that I’m pro-understanding and pro-theory! I mostly endorse the saying, “nothing is so practical as a good theory.” My perceived disagreement is more around how much I trust/enjoy theory for its own sake vs. with an eye towards practice.
Sorry for the delayed reply on this one.
I do think we agree on rather a lot here. A few thoughts:
1. Seems there are separate questions of “how you model/role-models and heroes/personal identity” and separate questions of pedagogy.
You might strongly seek unifying principles and elegant theories but believe the correct way to arrive at these and understand these is through lots of real-world messy interactions and examples. That seems pretty right to me.
2. Your examples in this comment do make me update on the importance of engineering types and engineering feats. It makes me think that indeed LessWrong too much focuses only on heroes of “understanding” when there are heroes “of making things happen” which is rather a key part of rationality too.
A guess might be that this is down-steam of what was focused on in the Sequences and the culture that set. If I’m interpreting Craft and the Community correctly, Eliezer never saw the Sequences as covering all of rationality or all of what was important, just his own particular sub-art that he created in the course of trying to do one particular thing.
Seemingly answering is confused questions is more science-y than engineering-y and would place focus on great scientists like Feynman. Unfortunately, the community has not yet supplemented the Sequences with the rest of the art of human rationality and so most of the LW culture is still downstream of the Sequences alone (mostly). Given that, we can expect the culture is missing major key pieces of what would be the full art, e.g. whatever skills are involved in being Jeff Dean and John Carmack.
About that you might be correct. Personally, I do think I enjoy theory even without application. I’m not sure if my mind secretly thinks all topics will find their application, but having applications (beyond what is needed to understand) doesn’t feel key to my interest, so something.
At this point, I basically agree that we agree and that the most useful follow up action is for someone (read: me) to actually be the change they want to see and write some (object-level), and ideally good, content from a more engineering-y bent.
As I mentioned in my reply to jimrandomh, a book review seems like a good place for me to start.
Cool. Looking forward to it!
Anki’s Not About Looking Stuff Up
Attention conservation notice: if you’ve read Michael Nielsen’s stuff about Anki, this probably won’t be new for you. Also, this is all very personal and YMMV.
In a number of discussions of Anki here and elsewhere, I’ve seen Anki’s value measured in terms of time saved by not having to look stuff up. For example, Gwern’s spaced repetition post includes a calculation of when it’s worth it to Anki-ize threshold, although I would be surprised if Gwern hasn’t already thought about the claim going to make.
While I occasionally use Anki to remember things that I would otherwise have to Google, e.g. statistics, I almost never Anki-ize things so that I can avoid Googling them in the future. And I don’t think in terms of time saved when deciding what to Anki-ize.
Instead, (as Michael Nielsen discusses in his posts) I almost always Anki-ize with the goal of building a connected graph of knowledge atoms about an area in which I’m interested. As a result, I tend to evaluate what to Anki-ize based on two criteria:
Will this help me think about this domain without paper or a computer better?
In the Platonic graph of this domain’s knowledge ontology, how central is this node? (Pedantic note: it’s easier to visualize distance to the root of the tree, but this requires removing cycles from the graph.)
To make this more concrete, let’s look at an example of a topic I’ve been Anki-izing recently, causal inference. I just started Anki-izing this topic a week ago, so it’ll be easier for me to avoid idealizing the process. Looking at my cards so far, I have questions about and definitions of things like “d-separation”, “sufficient/admissible sets”, and “backdoor paths”. Notably, for each of these, I don’t just have a cloze card to recall the definition, I also have image cards that quiz me on examples and conceptual questions that clarify things I found confusing upon first encountering these concepts. I’ve found that making these cards has the effect of both forcing me to ensure I understand concepts (because writing cards requires breaking them down) and makes it easier to bootstrap my understanding over the course of multiple days. Furthermore, knowing that I’ll remember at least the stuff I’ve Anki-ized has a surprisingly strong motivational impact on me on a gut level.
All that said, I suspect there are some people for whom Anki-izing wouldn’t be helpful.
The first is people who have the time and a career in which they focus on a narrow enough set of topics such that they repeatedly see the same concepts and rarely go for long periods without revisiting them. I’ve experienced this myself for Python—I learned it well before starting to use Anki and used it every day for many years. So even if I forget some stuff, it’s very easy for me to use the language fluently after time away from it.
The second is, for lack of a better term, actual geniuses. Like, if you’re John Von Neumann and you legitimately have an approximation of a photographic memory (I’m really skeptical that he actually had an eidetic memory but regardless...) and can understand any concept incredibly quickly, you probably don’t need Anki. Also, if you’re the second coming if John Von Neumann and you’re reading this, cool!
To give another example, Terry Tao is a genius who also has spent his entire life doing math. Probably doesn’t need Anki (or advice from me in general in case it wasn’t obvious).
Finally, I do think how to use Anki well is an under-explored topic given that there’s on the order of 10 actual blog posts about it. Given this, I’m still figuring things out myself, in particular around how to Anki-ize stuff that’s more procedural, e.g. “when you see a problem like this, consider these three strategies” or something. If you’re also experimenting with Anki, I’d love to hear from you!
I briefly looked at gwern’s public database several months ago, and got the impression that he isn’t using Anki in the incremental reading/learning way that you (and Michael Nielsen) describe. Instead, he seems to just add a bunch of random facts. This isn’t to say gwern hasn’t thought about this, but just that if he has, he doesn’t seem to be making use of this insight.
I feel like the center often shifts as I learn more about a topic (because I develop new interests within it). The questions I ask myself are more like “How embarrassed would I be if someone asked me this and I didn’t know the answer?” and “How much does knowing this help me learn more about the topic or related topics?” (These aren’t ideal phrasings of the questions my gut is asking.)
In my experience, I often still forget things I’ve entered into Anki either because the card was poorly made or because I didn’t add enough “surrounding cards” to cement the knowledge. So I’ve shifted away from this to thinking something more like “at least Anki will make it very obvious if I didn’t internalize something well, and will give me an opportunity in the future to come back to this topic to understand it better instead of just having it fade without detection”.
I’m confused about what you mean by this. (One guess I have is big-O notation, but big-O notation is not sensitive to constants, so I’m not sure what the 5 is doing, and big-O notation is also about asymptotic behavior of a function and I’m not sure what input you’re considering.)
I think there are few well-researched and comprehensive blog posts, but I’ve found that there is a lot of additional wisdom the spaced repetition community has accumulated, which is mostly written down in random Reddit comments and smaller blog posts. I feel like I’ve benefited somewhat from reading this wisdom (but have benefited more from just trying a bunch of things myself). For myself, I’ve considered writing up what I’ve learned about using Anki, but it hasn’t been a priority because (1) other topics seem more important to work on and write about; (2) most newcomers cannot distinguish been good and bad advice, so I anticipate having low impact by writing about Anki; (3) I’ve only been experimenting informally and personally, and it’s difficult to tell how well my lessons generalize to others.
Those seem like good questions to ask as well. In particular, the second one is something I ask myself although, similar to you, in my gut more than verbally. I also deal with the “center shifting” by revising cards aggressively if they no longer match my understanding. I even revise simple phrasing differences when I notice them. That is, if I repeatedly phrase the answer to a card one way in my head and have it phrased differently on the actual card, I’ll change the card.
I think both this and the original motivational factor I described apply for me.
You’re right. Sorry about that… I just heinously abuse big-O notation and sometimes forget to not do it when talking with others/writing. Edited the original post to be clearer (“on the order of 10”).
Interesting, I’ve perused the Anki sub-reddit a fair amount, but haven’t found many posts that do what I’m looking for, which is both give good guidelines and back them up with specific examples. This is probably the closest thing I’ve read to what I’m looking for, but even this post mostly focuses on high level recommendations and doesn’t talk about the nitty-gritty such as different types of cards for different types of skills. If you’ve saved some of your favorite links, please share!
I agree that trying stuff myself has worked better than reading.
Regarding other topics being more important, I admit I mostly wrote up the above because I couldn’t stop thinking about it rather than based on some sort of principled evaluation of how important it would be. That said, I personally would get a lot of value out of having more people write up detailed case reports of how they’ve been using Anki and what does/doesn’t work well for them that give lots of examples. I think you’re right that this won’t necessarily be helpful for newcomers, but I do think it will be helpful for people trying to refine their practice over long periods of time. Given that most advice is targeted at newcomers, while the overall impact may be lower, I’d argue “advice for experts” is more neglected and more impactful on the margin.
Regarding takeaways not generalizing, this is why I think giving lots of concrete examples is good because it basically makes your claims reproducible. That is, someone can go out and try what you described fairly easily and see if it works for them.
I like CheCheDaWaff’s comments on r/Anki; see here for a decent place to start. In particular, for proofs, I’ve shifted toward adding “prove this theorem” cards rather than trying to break the proof into many small pieces. (The latter adheres more to the spaced repetition philosophy, but I found it just doesn’t really work.)
Richard Reitz has a Google doc with a bunch of stuff.
I like this forum comment (as a data point, and as motivation to try to avoid similar failures).
I like https://eshapard.github.io
Master How To Learn also has some insights but most posts are low-quality.
One thing I should mention is that a lot of the above links aren’t written well. See this Quora answer for a view I basically agree with.
I agree that thinking about this is pretty addicting. :) I think this kind of motivation helps me to find and read a bunch online and to make occasional comments (such as the grandparent) and brain dumps, but I find it’s not quite enough to get me to invest the time to write a comprehensive post about everything I’ve learned.
So… I just re-read your brain dump post and realized that you described an issue that I not only encountered but the exact example for which it happened!
I indeed have a card for Newton’s approximation but didn’t remember this fact! That said, I don’t know whether I would have noticed the connection had I tried to re-prove the chain rule, but I suspect not. The one other caveat is that I created cards very sparsely when I reviewed calculus so I’d like to think I might have avoided this with a bit more card-making.
I want to highlight a potential ambiguity, which is that “Newton’s approximation” is sometimes used to mean Newton’s method for finding roots, but the “Newton’s approximation” I had in mind is the one given in Tao’s Analysis I, Proposition 10.1.7, which is a way of restating the definition of the derivative. (Here is the statement in Tao’s notes in case you don’t have access to the book.)
Ah that makes sense, thanks. I was in fact thinking of Newton’s method (which is why I didn’t see the connection).
Although I haven’t used Anki for math, it seems to me like I want to build up concepts and competencies, not remember definitions. Like, I couldn’t write down the definition of absolute continuity, but if I got back in the zone and refreshed myself, I’d have all of my analysis skills intact.
I suppose definitions might be a useful scaffolding?
You’re right on both accounts. Maybe I should’ve discussed this in my original post… At least for me, Anki serves different purposes at different stages of learning.
Key definitions tend to be useful in the early stages, especially if I’m learning something on and off, as a way to prevent myself from having to constantly refer back and make it easier to think about what they actually mean when I’m away from the source. E.g., I’ve been exploring alternate interpretations of d-separation in my head during my commute and it helps that I remember the precise conditions in addition to having a visual picture.
Once I’ve mastered something, I agree that the “concepts and competencies” (“mental moves” is my preferred term) become more important to retain. E.g., I remember the spectral theorem but wish I remembered the sketch of what it looks like to develop the spectral theorem from scratch. Unfortunately, I’m less clear/experienced on using Anki to do this effectively. I think Michael Nielsen’s blog post on seeing through a piece of mathematics is a good first step. Deeply internalizing core proofs from an area presumably should help for retaining the core mental moves involved in being effective in that area. But, this is quite time intensive and also prioritizes breadth over depth.
I actually did mention two things that I think may help with retaining concepts and competencies—Anki-izing the same concepts in different ways (often visually) and Anki-izing examples of concepts. I haven’t experienced this yet, but I’m hopeful that remembering alternative visual versions of definitions, analogies to them, and examples of them may help with the types of problems where you can see the solution at a glance if you have the right mental model (more common in some areas than others). For example, I remember feeling (usually after agonizing over a problem for a while) like Linear Algebra Done Right had a lot of exercises where the right geometric intuition or representative example would allow you to see the solution relatively quickly and then just have to convert it to words.
Another idea for how to Anki-ize concepts and competencies better that I haven’t tried (yet) but will share anyway is succinctly capturing strategies pop up again and again in similar forms. To use another Linear Algebra Done Right example, there are a lot of exercises with solutions of the form “construct some arbitrary linear map that does what we want” and show it… does what we want. I remember this technique but worry that my pattern matching machinery for the types of problems to which it tends to apply has decayed. On the other hand, if I had an Anki card that just listed short descriptions of a few exercises and asked me which technique was core to their solutions, maybe I’d retain that competency better.
Weird thing I wish existed: I wish there were more videos of what I think of as ‘math/programming speedruns’. For those familiar with speedrunning video games, this would be similar except the idea would be to do the same thing for a math proof or programming problem. While it might seem like this would be quite boring since the solution to the problem/proof is known, I still think there’s an element of skill to and would enjoy watching someone do everything they can to get to a solution, proof, etc. as quickly as possible (in an editor, on paper, LaTex, etc.).
This is kind of similar to streaming ACM/math olympiad competition solving except I’m equally more in people doing this for known problems/proofs than I am for tricky but obscure problems. E.g., speed-running the SVD derivation.
While I’m posting this in the hope that others are also really interested, my sense is that this would be incredibly niche even amongst people who like math so I’m not surprised it doesn’t exist...
I’m not super familiar with the competitive math circuit, but my understanding is that this is part of it? People are given a hard problem and either individually or as a team solve it as quickly as possible.
Do you know of any videos on this? Ideally while the person is narrating their thoughts out loud.
3Blue1Brown has a video where he sort of does this for a hard Putnam problem. I say “sort of” because he’s not solving the problem in real time so much as retrospectively describing how one might solve it.
Yeah, that is one of my favorite videos by 3Blue1Brown and more like it would be pretty good.
Yep, I touched on this above. Personally, I’m less interested in this type of problem solving than I am in seeing someone build to a well-known but potentially easier to prove theorem, but I suspect people solving IMO problems would appeal to a wider audience.
Somewhat related:
https://xenaproject.wordpress.com/2020/05/23/the-complex-number-game/
This is awesome! I’ve been thinking I should try out the natural number game for a while because I feel like formal theorem proving will scratch my coding / video game itch in a way normal math doesn’t.
Related: here, DJB lays out the primary results of a single-variable calculus course in 11 LaTex-ed pages.
The problem with this is that it is very difficult to figure out what counts as a legitimate proof. What level of rigor is required, exactly? Are they allowed to memorize a proof beforehand? If not, how much are they allowed to know?
Solutions might be better to go with than proofs—if the answer is wrong, that’s more straightforward to show that whether or not a proof is wrong.
Yeah what would be ideal is if theorem provers were more usable and then this wouldn’t be an issue (although of course there’s still the issue of library code vs. from scratch code but this seems easier to deal with).
Memorizing a proof seems fine (in the same way that I assume you end up basically memorizing the game map if you do a speedrun).
I had a similar idea which was also based on an analogy with video games (where the analogy came from let’s play videos rather than speedruns), and called it a live math video.
Cool, I hadn’t seen your page previously but our ideas do in fact seem very similar. I think you were right to not focus on the speed element and instead analogize to ‘let’s play’ videos.
I have a friend who might be into programming speedrunning https://merveilles.town/@cancel/104005117320841920
Seems like the post you linked is a joke. Were you serious about the friend?
Serious in that I mean he might, I’d say, 0.1 that he’d be interested, but if that’s not negligible, I think if he took it up he’d be very good at it. I’ll ask him.
Cool!
Watching my kitten learn/play has been interesting from a “how do animals compare to current AIs perspective?” At a high level, I think I’ve updated slightly towards RL agents being further along the evolutionary progress ladder than I’d previously thought.
I’ve seen critiques of RL agents not being able to do long-term planning as evidence for them not being as smart as animals, and while I think that’s probably accurate, I have noticed that my kitten takes a surprisingly long time to learn even 2-step plans. For example, when it plays with a toy on a string, I’ll often try putting the toy on a chair that it only knows how to reach by jumping onto another chair first. It took many attempts before it learned to jump onto the other chair and then climb to where I’d put the toy, even though it had previously done that while exploring many times. And even then, it seems to be at risk of “catastrophic forgetting” where we’ll be playing in the same way later and it won’t remember to do the 2-step move. Related to this, its learning is fairly narrow even for basic skills, e.g. I have 4 identical chairs around a table but it will be afraid of jumping onto one even though it’s very comfortable jumping onto another.
Now part of this may be that cats are known for being biased towards trial-and-error compared to other similarly-sized mammals like dogs (see Gwern’s write-up for more on this) and that adult cats may be better than kittens at “long-term” planning. However, a lot of critiques of RL, such as Josh Tenenbaum’s, argue that our AIs don’t even compare to young children in terms of their abilities. This is undoubtedly true with respect to ability to actually move around in the world, grasp objects, etc. but seems less true than I’d previously thought with respect to “higher level” cognitive abilities such as planning. To make this more concrete, I’m skeptical that my kitten could currently succeed at a real life analogue to Montezuma’s Revenge.
Another thing I’ve observed relates to some recent work by Konrad Kording, Adam Marblestone, and Greg Wayne on integrating deep learning and neuroscience. They postulate that due to the genomic bottleneck, it’s plausible that brains leverage heterogeneous, evolving cost functions to do semi-supervised learning throughout development. While much more work needs to be done investigating this hypothesis (as acknowledged by the authors), it does ring true with some of my observations of my kitten. In particular, I’ve noticed that it recently became much more interested in climbing things and jumping on objects on its own, whereas previously I couldn’t even get it to using treats. This seems like a plausible example of a “switch” being flipped that increased reward for being high up (or something, obviously this is quite hand-wavy).
I’m trying to come up with predictions that I can make regarding the next few months based on these two initial observations but don’t have any great ideas yet.
I keep seeing rationalist-adjacent discussions on Twitter that seem to bottom out with the arguments of the general (very caricatured, sorry) form: “stop forcing yourself and get unblocked and then X effortlessly” where X equals learn, socialize, etc. In particular, a lot of focus seems to be on how children and adults can just pursue what’s fun or enjoyable if they get rid of their underlying trauma and they’ll naturally learn fast and gravitate towards interesting (but also useful in the long term) topics, with some inspiration from David Deutsch.
On one hand, this sounds great, but it’s so foreign to my experience of learning things and seems to lack the kind of evidence I’d expect before changing my cognitive strategies so dramatically. In fairness, I probably am too far in the direction of doing things because I “should”, but I still don’t think going to the other extreme is the right correction.
In particular, having read Mason Currey’s Daily Rituals, I have a strong prior that even the most successful artists and scientists are at risk of developing akrasia and need to systematize their schedules heavily to ensure that they get their butts in the chair and work. Given this, what might convince me would be if someone catalogue of thinkers who did interesting work, quotes or stories that provide evidence that they did what was fun, and counter-examples to show that it’s not cherry-picking.
The above is also is related to the somewhat challenged but still I think somewhat valid idea that getting better at things requires deliberate practice, which is not “fun”. This also leads me to a subtle point which is that I think “fun” may be being used in a non-standard way by people who claim that learning can always be “fun”. I.e. I can see how even not necessarily enjoyable in the moment practice can be something one values on reflection, but this seems like a misuse of the term “fun” to me.
Unblocking motivation is only enough on its own if the motivation is so strong that you feel “hungry” to do something. Long term this kind of hunger is, in my experience, unreliable, so it’s not enough just to unblock your ability to do things.
You also have to set up the conditions for your motivation to express itself, e.g. through daily rituals as you suggest. For example, a big problem people I talk to had to deal with when shelter-in-place orders hit was that they lost their daily rituals and had to establish new ones. It wasn’t that they didn’t want to work or do other things they normally do, it was that they lost the normal context in which they did them, and had to establish new contexts in which they expected to find themselves doing the intended activity.
Trying to force yourself to do things is like setting up the conditions without unblocked motivation.
So I think both things are required, but only one thing is the bottleneck at a time, thus lots of people need advice on one part and not the other at any given moment, creating evidence though that can look like all you need to do is fix one thing and everything else will follow.
I originally had your experience, and have seen enough people claim to get unblocked that there seems to be at least something to it. At the very least, if you have crippling depression, solving that is often higher impact than incremental skill growth.
I wrote up more thoughts about this here.
Thanks for replying and sharing your post. I’d actually read it a while ago but forgotten how relevant it is to the above. To be clear, I totally buy that if you have crippling depression or even something more mild, fixing that is a top priority. I also have enjoyed recent posts on and think I understand the alignment-based models of getting all your “parts” on board.
Where I get confused and where I think there’s less evidence is that the unblocking can make it such that doing hard stuff is no longer “hard”. Part of what’s difficult here is that I’m struggling to find the right words but I think it’s specifically claims of effortlessness or fun that seem less supported to me.
Generally, you have to solve the problem you have. (Related: Anna Karenina principle.)
If your problem happens to be some trauma, fix the trauma. If it is lack of tools, buy the right tools. If it is wasting time on social networks, install a web blocker. And if it’s just than you never prioritize doing X, but in retrospective always wish you had, precommit to spend some time doing X.
Of course, it could be more of those things together; maybe you have a trauma and also lack the right tools. Then you must solve both. Maybe one is more visible, and you only realize the other after fixing the first one.
This is basically my perspective but seems contrary to the perspective in which most problems are caused by internal blockages, right?
Yep.
The idea that everything is caused by internal conflicts, and if we only could resolve all the internal conflicts (which might take a few years of hard work, if you want to do it thoroughly) we would become amazing supermen (so all those years spent on therapy would still be totally worth it), originates from Freud.
It is my long-term source of amusement, that if you mention Freud of psychoanalysis in the rationalist community, you reliably get “pseudoscience”, “it’s completely debunked”, et cetera… but if you rephrase the same ideas using modern language, without mentioning the source, they become accepted rationalist wisdom.
One way I think about things. Everything that I’ve found in myself and close friends that looks and smells like “shoulds” is sorta sneaky. I keep on finding shoulds which seem have been absorbed from others and are less about “this is a good way to get a thing in the world that I want” and “someone said you need to follow this path and I need them to approve of me”. The force I feel behind my shoulds is normally “You SCREWED if you don’t!” a sort of vaguely panicy, inflexible energy. It’s rarely connected to the actual good qualities of the thing I “should” be doing.
Because my shoulds normally ground out in “if I’m not this way, people won’t like me”, if the pressure get’s turned up, following a should takes me farther and farther away from things I actually care about. Unblocking stuff often feels like transcending the panicy fear that hides behind a should. It never immediately lets me be awesome at stuff. I still need to develop a real connection to the task and how it works into the rest of my life. There’s still drudgery, but it’s dealt with from a calmer place.
Yes I can relate to this!
I think removing internal conflicts is a “powerful but not sufficient.”
The people who are most productive are also great at amplifying external conflicts. That is, they have a clear, strong vision, and amplify the creative tension between what they have and know they can have. This can help you do things that are not “fun” like deliberate practice. but are totally aligned, in that you have no objections to doing them, and have a stance of acceptance towards the things that are not enjoyable.
The best then augment that with powerful external structures that are supportive of their ideal internal states and external behaviors.
Each one of these taken far enough can be powerful, and when combined together they are more than the sum of their parts.
Thanks, this framing is helpful for me for understanding how these things can be seen to fit together.
Interesting Bill Thurston quote, sadly from his obituary:
To prevent mis-interpretation, I think people often look at quotes like this (I’ve seen similar ones about Feynman) and think “ah yes, see anyone can do it”. But IME the thing he’s describing is much harder to achieve than the “case-by-case”/”step-by-step” stuff.
Blockchain idea inspired by 80,000 Hours’s interview of Vitalik Buterin: a lot of podcasts either have terrible transcriptions or presumably pay a service to transcribe their sessions. However, even these services make minor typos such as “ASX” instead of “ASICs” in the linked interview.
Now, most people who read these transcripts presumably notice at least a subset of these typos but don’t want to go through the effort of emailing podcasters to tell them about it. On the flip side, there’s no good way for hosts to scalably audit flagged typos to see if they’re actually typos. What we really want is a mostly automated mechanism to aggregate flagged typos and accept fixes which multiple people agree upon that only pays people (micro amounts) for correctly identifying typos.
This mechanism seems like something that could live on a blockchain in some sort of smart contract. Obviously, like almost every blockchain application, you *could* do it without a blockchain, but using blockchain makes it easy to audit and distribute the micro-payments rather than having to program the voting scheme from scratch on top of a centralized database.
I’ve recently been obsessing over the idea of trying to “make math more like programming”. I’m not sure if it’s just because I feel fluent at programming and still not very fluent at abstract math or also because programming really does have a feedback loop that you don’t get in math.
Regardless, based on my reading it seems like there’s a general consensus in math that even the most modern theorem provers, like Lean and Coq, are much less efficient than typical “informal” math reasoning. That said, I wonder if this ignores some of the benefits that programmers get from writing in a formal language, i.e. automatic refactoring tools, fast feedback loops, and code analysis/search tools. Also, it seems like a sufficiently user-friendly math theorem proving tool could be useful for education. If kids can learn how to program in Javascript, I have to believe they can learn to prove theorems, even if the learning curve’s relatively steep.
Maybe once I play around with Lean more, I’ll change my mind, but for now, I’m sticking to my contrarian viewpoint.
It seems like a useful idea on a lot of levels.
There’s a difference between solving a problem where you’re 1) trying to figure out what to do. 2) Executing an algorithm. 3) Evaluating a closed form solution (Plugging the values into the equation, performing the operations, and seeing what the number is.)***
Names. If you’re writing a program, and you decide to give things (including functions/methods) names like the letters of the alphabet it’s hard for other people to understand what you’re doing. Including future you. As a math enthusiast I see the benefit of not having to generate names*, but teaching wise? I can see some benefits of merging/mixing. (What’s sigma notation? It’s a for loop.)
Functions. You can say f’ is the derivative of f. Or you can get into the fact that there are functions** that take other functions as arguments. You can focus narrowly on functions of one-variable. Or you can notice that + is a function that takes two numbers (just like *, /, ^).
*Like when your idea of what you’re doing /with something changes as you go and there’s no refactoring tool on paper to change the names all at the last minute. (Though paper feels pretty nice to work with. That technology is really ergonomic.)
**And that the word function has more than one meaning. There’s a bit of a difference between a way of calculating something and a lookup table.
***Also, seeing how things generalize can be easier with tools that can automatically check if the changes you’ve made have broken what you were making. (Writing tests.)
I’ve been reading a bit about John Conway since his (unfortunate) death. One thing I keep noticing is that everyone seems to emphasize how core having fun was to John Conway’s way of doing math. One question I’m interested in in general is how important fun and curiosity are for doing good research.
I’ve considered posting a question about this that uses John Conway as an example of someone who 1) was genuinely curious and fun-loving but 2) also had other gifts that played a large role in his ability to do great math. But, I don’t want to be insensitive given that he only recently died.
I am considering using Feynman as an example instead as someone who cultivated a reputation for just having fun but had other gifts which played a large role in his success, but I feel a little weird that Conway was the person who originally triggered the thought.
Anyone who reads my shortform have an idea for how I can ask this in a tasteful way?
I also expect you’ll get answers that are focused on his legacy if you ask that kind of question about him now. Feynman is the central example I think of for this, and there’s a lot more published about and by him, so I’d suggest using him.
(I think there is a strong connection between fun and curiosity and doing good research.)
Thanks for the feedback!
It seems like (unless I just haven’t discovered it yet) there’s a sore need for a framework for causal model comparison, analogous to Bayesian model comparison. If you read Pearl (and his students), they rightfully point out that you can’t get causal claims without causal assumptions but don’t talk much about how you actually formulate the causal model in the first place (“domain knowledge”). As a result, if you look at the literature, researchers mostly seem to use a small set of causal models that may or may not describe phenomena, e.g. the classic “instrumental variable” graph, for inference.
I view this as analogous to selecting a prior in applied Bayesian modeling. However, there there’s a nice set of tools for comparing how likely different models are, whereas I’m not aware of any such thing in the causal inference world. There’s something called “sensitivity analysis” but that’s about how much deviation from your assumptions affects your conclusions.
I forgot to include the disclaimer besides statistical independence tests, which can invalidate graphs but are difficult in practice.
Epistemic status: Thinking out loud.
Introducing the Question
Scientific puzzle I notice I’m quite confused about: what’s going on with the relationship between thinking and the brain’s energy consumption?
On one hand, I’d always been told that thinking harder sadly doesn’t burn more energy than normal activity. I believed that and had even come up with a plausible story about how evolution optimizes for genetic fitness not intelligence, and introspective access is pretty bad as it is, so it’s not that surprising that we can’t crank up our brains energy consumption to think harder. This seemed to jive with the notion that our brain’s putting way more computational resources towards perceiving and responding to perception than abstract thinking. It also fit well with recent results calling ego depletion into question and into the framework in which mental energy depletion is the result of a neural opportunity cost calculation.
Going even further, studies like this one left me with the impression that experts tended to require less energy to accomplish the same mental tasks as novices. Again, this seemed plausible under the assumption that experts brains developed some sort of specialized modules over the thousands of hours of practice they’d put in.
I still believe that thinking harder doesn’t use more energy, but I’m now much less certain about the reasons I’d previously given for this.
Chess Players’ Energy Consumption
This recent ESPN (of all places) article about chess players’ energy consumption during tournaments has me questioning this story. The two main points of the article are:
Chess players burn a lot of energy during tournaments, potentially on the order of 6000 calories a day (that’s about what marathon runners burn in a day). This results from intense mental stress leading to an elevated heart rate and, as a result, increased oxygen consumption. Chess players also tend to eat less during competitions, which also contributes to weight loss during tournaments (apparently Karpov once lost 20 pounds during an extended chess championship).
Chess players and their coaches now understand that humans aren’t Cartesian, i.e. our physical health impacts our cognitive performance, and have responded accordingly with intense physical training regimens. On the surface, none of this contradicts the claims I cited above. The article’s claiming that chess players burn more energy purely from the side effects of stress, not because their brains are doing more work. So why am I revisiting this question?
Gaps in the Evolutionary Justification
First, reading the chess article led me to notice a big gap in the explanation I gave above for why we shouldn’t expect a connection between thinking hard and energy consumption. In my explanation, I mentioned that we should expect our brains to spend much more energy on perceptive and reactive processing than on abstract thinking. This still makes sense to me as a general claim about the median mammal, but now seems less plausible to me as it relates to humans specifically. This recent study, for example, provides evidence that our (humans) big brains are one of two primary causes for our increased energy consumption compared to other primates. As far as I can tell, humans don’t seem to have meaningfully better coordination or perceptive abilities than chimps. Chimps have opposable thumbs and big toes, spend their days picking bugs off of each other, and climbing trees. Given this, while I admittedly haven’t looked into studies on this but I find it hard to imagine that human brains spend much more energy than chimps on perception.
Let’s say that we put aside the question of what exactly human brains use their extra energy for and bucket it into the loose category of “higher mental functions”. This still leaves me with a relevant question, why didn’t brains evolve to use varying amounts of energy depending on what they were doing? In particular, if we assume that humans are the first and only mammals that spend large fractions of their calories on “extra” brain functions, then why wasn’t there selection pressure to have those functions only use energy when they were needed instead of all the time?
Bringing things back to my original point, in my initial story, thinking didn’t impact energy consumption because our brains spend most of their energy on other stuff anyway, so there wasn’t strong selective pressure to connect thinking intensity to energy consumption. However, I’ve just given some evidence that “higher brain functions” actually did come with a significant energy cost, so we might expect that those functions’ energy consumption would in fact be context-dependent.
Second, it’s weird that what we’re doing (mentally) can so dramatically impact our energy consumption due to elevated heart rate and other stress-triggered adaptations but has no impact on the energy our brain consumes. To be clear, it makes sense that physical activity and stress would be intimately connected as this connection is presumably very important for balancing the need to eat/escape predators with the need to not use too much energy when sitting around. One doesn’t yet make sense to me is that, even though neurons evolved from the same cells as all the rest of our biology, they proved so resistant to optimization for variable energy consumption.
Rescuing the Original Hypothesis
The best explanation I can come up with for the two puzzles I just discussed is that, for whatever reason, evolution didn’t select for a neural architecture that could selectively up- and down-regulate its energy consumption depending on the circumstances. For example, maybe the fact that neurons die when they don’t have energy is somehow intimately coupled with their architecture such that there’s no way to fix it short of something only a goal-directed consequentialist (and therefore not a hill-climbing process) could accomplish. If this is true, even though humans plausibly would’ve benefited at some point during our evolutionary history from being able to spend more or less energy on thinking, we shouldn’t be surprised never happened.
Another weaker (IMO) explanation is that human brains do use more energy in certain situations for some “higher mental functions” but it’s not the situations you’d expect. For example, maybe humans use a ton of energy for social cognition and if we could measure the neocortex’s energy consumption during parties, we’d find it uses a lot more energy than usual.
The ESPN article had a misleading title. They go on to say that a player burns 6000 calories a day , but Caruana runs an hour a day (or more). These Grandmasters are not reaching into some esoteric mental ability and burning more calories that way; if anyone has ever seen a Grandmaster play against many players at once, or blindfolded (or even blindfolded and against many players!) one can really understand that they see the board in a way that’s pretty different from us.
The classical theory for this is that they have formed bigger/better chunks than us from excessive playing (the very same way a Mathematician or a Basketball player does). Calorie consumption, is thus correlation in that specific context.
Although, I think, a (weak) connection could be made between the use of Language and these chunks formations or using this chunks (who’s to say this is not a specialized use of Language?) for the context of a tournament, but I have yet to see anything that support this idea.
My takeaway from the article was that, to your point, their brains weren’t using more energy. Rather, the best hypothesis was just that their adrenal hormones remained elevated for many hours of the day, leading to higher metabolism during that period. Running an hour a day is definitely not enough to burn 6000 calories for the record (a marathon burns around 3500).
Maybe I wasn’t clear, but that’s what I meant by the following.
Got it! then I agree with you. I think that a best description of my point would be that yeah, these guys are not burning calories by thinking better or harder. Their exercise plus the higher stress environment could account alone for their high amount burn of calories.
ML-related math trick: I find it easier to imagine a 4D tensor, say of dimensions B×F×M×N, as a big matrix with dimensions B×F within which are nested matrices of dimensions M×N. The nice thing about this is, at least for me, it makes it easier to imagine applying operations over the B×F matrices in parallel, which is something I’ve had to thing about a number of times doing ML-related programming, e.g. trying to figure out how write the code to apply a 1D convolution-like operation to an entire batch in parallel.
I’ve been studying tensor decompositions and approximate tensor formats for half a year. Since I’ve learned about tensor networks, I’ve noticed that I can draw them to figure out how to code some linear operations on tensors.
Once I used this to figure out how to implement backward method of some simple neural network layer (not something novel, it was for the sake of learning how deep learning frameworks work). Another time I needed to figure out how to implement forward method for a Conv2d layer with weights tensor in CP format. After drawing its output as a tensor network diagram, it was clear that I could just do a sequence of 3 Conv2d layers: pointwise, depthwise, pointwise.
I am not saying that you should learn tensor networks, it’s probably a lot of buck for not too large bang unless you want to work with tensor decompositions and formats.
From cursory Googling, it looks like tensor networks are mostly used for understanding quantum systems. I’m not opposed to learning about them, but is there a good resource you can point me to that introduces them independent of the physics concepts? Were you learning them for use in physics?
For example, have you happened to read this Google AI paper introducing their TensorNetworks library and giving an overview?
Unfortunately I don’t know any quantum stuff. I learned them for machine learning purposes.
A monograph by Cichocki et al. (part 1, part 2) is an overview of how tensor decompositions, tensor formats, and tensor networks can be used in machine learning and signal processing. I think it lacks some applications, including acceleration and compression of neural networks by compression of weights of layers using tensor decompositions (this also sometimes improves accuracy, probably by reducing overfit).
Tensor decompositions and Applications by Kolda, Bader 2009 - this is an overview of tensor decompositions. It doesn’t have many machine learning applications. Also it doesn’t talk of tensor networks, only about some simplest tensor decompositions and specific tensor formats which are the most popular types of tensor networks. This paper was the first thing I read about all the tensor stuff, and it’s one of the easier things to read. I recommend you read it first and then look at the topics that seem interesting to you in Cichocki et al.
Tensor spaces and numerical tensor calculus—Hackbusch 2012 - this textbook covers mathematics of tensor formats and tensor decompositions for hilbert and banach spaces. No applications, a lot of math, functions analysis is kinda a prerequisite. Very dense and difficult to read textbook. Also doesn’t talk of tensor networks, only about specific tensor formats.
Handwaving and interpretive dance—This is simple, it’s about tensor networks, not other tensor stuff. It’s for physicists but chapter 1 and maybe other chapters can be read without physics background.
Regarding the TensorNetwork library. I’ve skim-read it. I haven’t tried using it. I think it’s in early alpha or something. How usable it is for me depends on how well it can interact with pytorch and how easy it is to do autodifferentiation w.r.t. core tensors and use the tensor network in a pytorch model. Intuitively the API seemed nice. I think their idea is to that you take a tensor, make it into a matrix, do truncated svd, now you have 2 matrices, turn them back to tensors. Now you do the same for them. This way you can perform some but not all popular tensor decomposition algorithms.
P.S. Fel free to message me if you have questions about tensor decomposition/network/formats stuff
How to remember everything (not about Anki)
In this fascinating article, Gary Marcus (now better known as a Deep Learning critic, for better or worse) profiles Jill Price, a woman who has an exceptional autobiographical memory. However, unlike others that studied Price, Marcus plays the role of the skeptic and comes to the conclusion that Price’s memory is not exceptional in general, but instead only for the facts about her life, which she obsesses over constantly.
Now obsessing over autobiographical memories is not something I’d recommend to people, but reading this did make me realize that to the degree it’s cultivate-able, continuously mulling over stuff you’ve learned is a viable strategy for remembering it much better.
I would love to see her cognitive strategy modeled in more depth. What are the beliefs and emotions that are sustaining that constant mulling?
It seems not that conscious. I suspect it’s similar to very scrupulous people who just clean / tidy up by default. That said, I am very curious whether it’s cultivatable in a less pathological way.
Sometimes there are articles I want to share, like this one, where I don’t generally trust the author and they may have quite (what I consider) wrong views overall but I really like some of their writing. On one hand, sharing the parts I like without crediting the author seems 1) intellectually / epistemically dishonest and 2) unfair to the author. On the other hand, providing a lot of disclaimers about not generally trusting the author feels weird because I feel uncomfortable publicly describing why I find them untrustworthy.
Not really sure what to do here but flagging it to myself as an issue explicitly seems like it might be useful.
One thing to do could just be to add an “epistemic status” to articles you share, with some being like “interesting writing” or “made me think” and others being like “agree” or “seems basically correct”
Yeah good idea.
Taking Self-Supervised Learning Seriously as a Model for Learning
It seems like if we take self-supervised learning (plus a sprinkling of causality) seriously as key human functions, we can more directly enhance our learning by doing much more prediction / checking of predictions while we learn. (I think this is also what predictive processing implies but don’t understand that framework as well.)
Weird thought I had based on a tweet about gradient descent in the brain: it seems like one under-explored perspective on computational graphs is the causal one. That is, we can view propagating gradients through the computational graph as assessing the effect of an intervention on some variable on all of a nodes’ children.
Reason to think this might be useful:
*Maybe* this can act as a different lens for examining NN training?
Reasons why this might not be useful:
It’s not obvious that it makes sense to think of nodes in an NN (or any differential computational graph) as causally related in the sense we usually talk about in causal inference.
A causal interpretation of gradients isn’t obvious because they’re so local, whereas most causal inference focuses on the impact of more non-trivial interventions. OTOH, there are some NN interpretability techniques that try to solve this, so maybe these have better causal interpretations?
If algebra’s a deal with the devil where you get the right answer but don’t know why, then geometric intuition’s a deal with the devil where you always get an answer but don’t know whether it’s right.
Someone should write the equivalent of TAOCP for machine learning.
(Ok, maybe not literally the equivalent. I mean Knuth is… Knuth. So it doesn’t seem realistic to expect someone to do something as impressive as TAOCP. And yes, this is authority worship and I don’t care. He’s Knuth goddamn it.)
Specifically, a book where the theory/math’s rigorous but the algorithms are described in their efficient forms. I haven’t found this in the few ML books I’ve read parts of (Bishop’s Pattern Recognition and Machine Learning, MacKay’s Information Theory, and Tibrishani et Al’s Elements of Statistical Learning), so if it’s already out there, let me know.
Note that I don’t mean that whoever does this should do the whole MMIX thing and write their own language and VM.
(Removed.)
*writing the movie right now*
Relevant here: https://www.lesswrong.com/posts/bshZiaLefDejvPKuS/dying-outside
Comment removed for posterity.
I have reported this comment. Hopefully the mods will remove it.
Please don’t speculate on the identity of Satoshi, or spread speculation by others. It has led in multiple cases to people being stalked, blackmailed, harassed, and mugged. Posts like this put innocent lives in physical danger. Be responsible and keep this sort of thing off the Internet.
(Note: responded quickly before removing. I’ve since edited this comment now that I have more time. Also I’m not the person who downvoted your post.)
I definitely did not intend to cause anyone or their family danger (or harassment, etc.), so I’ve removed the post.
Mostly in the selfish interest of showing that I wasn’t being negligent, I did consider this risk before posting. That’s why I noted that I have no information beyond what’s already public and was taking into account that since I heard this speculation on a podcast which involved one relatively prominent cryptocurrency person (I won’t say who so as not to publicize it further), it seemed unlikely that my post would add additional noise.
All that said, I still agree that even a small chance of harm is more than enough reason to remove the post. Especially, since:
it seems like you’re more involved in the crypto community than I and therefore probably have more context than I do on this topic; and
my own version of integrity includes not doing things that only don’t cause bad outcomes because they’re obscure (related to my second point above).
Thank you. Yes it is a real problem, speaking from experience from the people I personally know. The reason these events are not talked about much is that any press just makes the problem worse—it gives a bunch of copycat muggers the same bright idea. So unfortunately you get a bunch of speculation and not a lot of observable evidence of the downsides of that speculation, so people don’t realize the harm that has been caused.
There are people who have been killed in attempted bitcoin muggings. Speculating on the Internet that someone is possession of >1 million bitcoins is like tattooing a big target on their back they can’t get rid of.
Thanks, that helps contextualize.
For the record I’m one who downvoted Mark; I don’t agree with him and I think it sad that you, an1lam, removed the original post which I don’t think did any harm whatsoever (reasons should be pretty obvious, a random short-form post about an hypothetical movie somehow it’s evidence that Hal was Satoshi? I do not think so at all.)
The risk to innocents is real. Physical security is a really hard problem for people in this space, and the police won’t protect those at risk. Does one post on one rationalist website really matter? Yes, for the same reason your vote matters at the ballot box. This is the collective action problem. If nobody self-censors a statement that puts people at risk, the risks only increase over time and those who help propagate the info are morally culpable.
Link post for a short post I just published describing my way of understanding Simpson’s Paradox.
Thing I desperately want: tablet native spaced repetition software that lets me draw flashcards. Cloze deletions are just boxes or hand-drawn occlusions.
Today I attended the first of two talks in a two-part mini-workshop on Variational Inference. It’s interesting to think of from the perspective of my recent musings about more science-y vs. engineering mindsets because it highlighted the importance of engineering/algorithmic progress in widening Bayesian methods’ applicability
The presenter, who’s a fairly well known figure in probabilistic ML and has developed some well known statistical inference algorithms, talked about how part of the reason so much time was spent debating philosophical issues in the past was because Bayesian inference wasn’t computationally tractable until the development of Gibbs Sampling in the ’90s by Gelfand & Smith.
To be clear, the type of progress I’m talking about is still “scientific” in the sense of it mostly involves applied math and finding good ways to approximate posterior distributions. But, it’s “engineering” in the sense that it’s the messy sort of work I talked about in my other post, where messy means a lot of the methods don’t have a good theoretical backing and involve making questionable (at least ex ante) statistical assumptions. Now, the counter is of course that we don’t have a theoretical backing yet, but there still may be one in the future.
I’ll probably have more to say about this when the workshop’s over but I partly just wanted to record my thoughts while they were fresh.
I’m interested in reading more about what might’ve been going on in Ramanujan’s head when he did math. So far, the best thing I’ve found is this.