I’m pretty sure my LessWrong posts have gotten more than 1000 hits across my entire life (and keep in mind that “hits” is different from “an actual human actually reads the article”), but fair enough—Wikipedia pages do get a lot of views.
Wikipedia pageviews punch above their weight. First, your pageviews probably do drop off rapidly enough that it is possible that a WP day = lifetime. People just don’t go back and reread most old LW links. I mean, look at the submission rate—there’s like a dozen a day or something. (I don’t even read most LW submissions these days.) While WP traffic is extremely durable: ‘Expected value’ will be pulling in 1.7k hits/days (or more) likely practically forever.
Second, the quality is distinct. A Wikipedia article is an authoritative reference which is universally consulted and trusted. That 1.7k excludes all access via the APIs AFAIK, and things like readers who read the snippets in Google Search. If you Google the phrase ‘expected value’, you may not even click through to WP because you just read the searchbox snippet:
About
In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes.
This includes machine learning. Every LLM is trained very heavily on Wikipedia; any given LW page, on the other hand, may well not make the cut, either because it’s too recent to show up in the old datasets everyone starts with like The Pile, or because it gets filtered out for bad reasons, or they just don’t train enough tokens. And there is life beyond LLM in ML (hard to believe these days, but I am told ML researchers still exist who do other things), and WP articles will be in those, as part of the network or WikiData etc. A LW post will not.
Then you have the impact of WP. As anyone who’s edited niche topics for years can tell you, WP articles are where everyone starts, and you can see the traces for decades afterwards. Hallgren mentions David Gerard, and Roko’s Basilisk is a good example of that—it is the one thing “everyone knows” about LessWrong, and it is due almost solely to Wikipedia. The hit count on the ‘LessWrong’ WP article will never, ever reflect that.
But editing WP is difficult even without a Gerard, because of the ambient deletionists. An example: you may have seen recently going around (even on MR) a Wikipedia link about the interesting topic of ‘disappearing polymorphs’. It is a fascinating chemistry topic, but on Gwern.net, I did not link to it, but to a particular revision of another article. Why? Because an editor, Smokefoot, butchered it after I drew attention to it on social media prior to the current burst of attention. (Far from the first time—this is one of the hazards of highlighting any Wikipedia article.) We can thank Yitzilitt & Cosmia Nebula for since writing a new ‘Disappearing polymorph’ article which can stand up to Smokefoot’s butchering; it is almost certainly the case that it took them 100x, if not 1000x, more time & effort to write that than it took Smokefoot to delete the original material. (On WP, when dealing with a deletionist, it is worse than “Brandolini’s law”—we should be so lucky that it only took 10x the effort...)
Concurring with the sentiment, I have realized that nothing I write is going to be as well-read as Wikipedia, so I have devoted myself to writing Wikipedia instead of trying to get a personal blog anymore.
I will comment on a few things:
I really want to get the neural scaling law page working with some synthesis and updated data, but currently there are no good theoretical synthesis. Wikipedia isn’t good for just a giant spreadsheet.
I wrote most of the GAN page, the Diffusion Model page, Mixture of Experts, etc. I also wrote a few sections of LLM and keep the giant table updated for each frontier model. I am somewhat puzzled by the fact that it seems I am the only pony who thought of this. There are thousands of ML personal blogs, all in the Celestia-forsaken wasteland of not getting read, and then there is Wikipedia… but nopony is writing there? Well, I guess my cutie mark is in Wikipedia editing.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth! So I fired the Orbital Friendship Mathematical Cannon. I thought that if I’m not going to write another blog, then Wikipedia has to be on the same level of a good blog, so I set my goal to the Lilian Wang’s blog level, and a lack of mathematics is definitely bad.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY. (Weng’s blog, ironically, might make the cut as a secondary source, despite containing pretty much just paraphrases or quotes from primary sources, but only because she’s an OA exec.) Which is a big part of why the DL articles all suck because there just aren’t many good secondary or tertiary sources like encyclopedias. (Well, there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.) There is no GAN textbook I know of which is worthwhile, and I doubt ever will be.
there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.
I hate Schmimdhuber with a passion because I can smell everything he touches on Wikipedia and they are always terrible.
Sometimes when I read pages about AI, I see things that almost certainly came from him, or one of his fans. I struggle to speak of exactly what Schmidhuber’s kind of writing gives, but perhaps this will suffice: “People never give the right credit to anything. Everything of importance is either published by my research group first but miscredited to someone later, or something like that. Deep Learning? It’s done not by Hinton, but Amari, but not Amari, but by Ivanenkho. The more obscure the originator, the better, because it reveals how bad people are at credit assignment—if they were better at it, the real originators would not have been so obscure.”
For example, LSTM is actually originated by Schmidhuber… and actually, it’s also credited to Schmidhuber (… or maybe Hochreiter?). But then GAN should be credited to Schmidhuber, and also Transformers. Currently he (or his fans) kept trying to put the phrase “internal spotlights of attention” into the Transformer page, and I kept removing it. He wanted the credit so much that he went for argument-by-punning, renaming “fast weight programmer” to “linear transformers”, and to quote out of context “internal spotlights of attention” just to fortify the argument with a pun! I can do puns too! Rosenblatt (1962) even wrote about “back-propagating errors” in an MLP with a hidden layer. So what?
I actually took Schmidhuber’s claim seriously and carefully rewrote of Ivanenkho’s Group method of data handling, giving all the mathematical details, so that one may evaluate it for itself instead of Schmidhuber’s claim. A few months later someone manually reverted everything I wrote! What does it read like according to a partisan of Ivanenkho?
The development of GMDH consists of a synthesis of ideas from different areas of science: the cybernetic concept of “black box” and the principle of successive genetic selection of pairwise features, Godel’s incompleteness theorems and the Gabor’s principle of “freedom of decisions choice”, and the Beer’s principle of external additions. GMDH is the original method for solving problems for structural-parametric identification of models for experimental data under uncertainty… Since 1989 the new algorithms (AC, OCC, PF) for non-parametric modeling of fuzzy objects and SLP for expert systems were developed and investigated. Present stage of GMDH development can be described as blossom out of deep learning neuronets and parallel inductive algorithms for multiprocessor computers.
Well excuse me, “Godel’s incompleteness theorems”? “the original method”? Also, I thought “fuzzy” has stopped being fashionable since 1980s. I actually once tried to learn fuzzy logic and gave up after not seeing what is the big deal. It is filled with such pompous and self-important terminology, as if the lack of substance must be made up by the heights of spiritual exhortation. Why say “combined” when they could say “consists of a synthesis of ideas from different areas of science”?
As a side note, such turgid prose, filled with long noun-phrases is pretty common among the Soviets. I once read that this kind of massive noun-phrase had a political purpose, but I don’t remember what it is.
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY.
Aren’t most of the sources going to be journal articles? Academic papers are definitely fair game for citations (and generally make up most citations on Wikipedia).
Wikipedia pageviews punch above their weight. First, your pageviews probably do drop off rapidly enough that it is possible that a WP day = lifetime. People just don’t go back and reread most old LW links. I mean, look at the submission rate—there’s like a dozen a day or something. (I don’t even read most LW submissions these days.) While WP traffic is extremely durable: ‘Expected value’ will be pulling in 1.7k hits/days (or more) likely practically forever.
Second, the quality is distinct. A Wikipedia article is an authoritative reference which is universally consulted and trusted. That 1.7k excludes all access via the APIs AFAIK, and things like readers who read the snippets in Google Search. If you Google the phrase ‘expected value’, you may not even click through to WP because you just read the searchbox snippet:
This includes machine learning. Every LLM is trained very heavily on Wikipedia; any given LW page, on the other hand, may well not make the cut, either because it’s too recent to show up in the old datasets everyone starts with like The Pile, or because it gets filtered out for bad reasons, or they just don’t train enough tokens. And there is life beyond LLM in ML (hard to believe these days, but I am told ML researchers still exist who do other things), and WP articles will be in those, as part of the network or WikiData etc. A LW post will not.
Then you have the impact of WP. As anyone who’s edited niche topics for years can tell you, WP articles are where everyone starts, and you can see the traces for decades afterwards. Hallgren mentions David Gerard, and Roko’s Basilisk is a good example of that—it is the one thing “everyone knows” about LessWrong, and it is due almost solely to Wikipedia. The hit count on the ‘LessWrong’ WP article will never, ever reflect that.
But editing WP is difficult even without a Gerard, because of the ambient deletionists. An example: you may have seen recently going around (even on MR) a Wikipedia link about the interesting topic of ‘disappearing polymorphs’. It is a fascinating chemistry topic, but on Gwern.net, I did not link to it, but to a particular revision of another article. Why? Because an editor, Smokefoot, butchered it after I drew attention to it on social media prior to the current burst of attention. (Far from the first time—this is one of the hazards of highlighting any Wikipedia article.) We can thank Yitzilitt & Cosmia Nebula for since writing a new ‘Disappearing polymorph’ article which can stand up to Smokefoot’s butchering; it is almost certainly the case that it took them 100x, if not 1000x, more time & effort to write that than it took Smokefoot to delete the original material. (On WP, when dealing with a deletionist, it is worse than “Brandolini’s law”—we should be so lucky that it only took 10x the effort...)
Finally somepony noticed my efforts!
Concurring with the sentiment, I have realized that nothing I write is going to be as well-read as Wikipedia, so I have devoted myself to writing Wikipedia instead of trying to get a personal blog anymore.
I will comment on a few things:
I really want to get the neural scaling law page working with some synthesis and updated data, but currently there are no good theoretical synthesis. Wikipedia isn’t good for just a giant spreadsheet.
I wrote most of the GAN page, the Diffusion Model page, Mixture of Experts, etc. I also wrote a few sections of LLM and keep the giant table updated for each frontier model. I am somewhat puzzled by the fact that it seems I am the only pony who thought of this. There are thousands of ML personal blogs, all in the Celestia-forsaken wasteland of not getting read, and then there is Wikipedia… but nopony is writing there? Well, I guess my cutie mark is in Wikipedia editing.
The GAN page and the Diffusion Model page were Tirek-level bad. They read like somepony paraphrased about 10 news reports. There was barely a single equation, and that was years after GAN and DM had proved their worth! So I fired the Orbital
FriendshipMathematical Cannon. I thought that if I’m not going to write another blog, then Wikipedia has to be on the same level of a good blog, so I set my goal to the Lilian Wang’s blog level, and a lack of mathematics is definitely bad.I fought a bitter edit war on Artificial intelligence in mathematics with an agent of Discord [deletionist] and lost. The edit war seems lost too, but a brief moment is captured in Internet Archive… like tears in the rain. I can only say like Galois… “On jugera [Posterity will judge]”.
My headcanon is that Smokefoot is a member of BloodClan.
Yes, but WP deletionists only permit news reports, because those are secondary sources. You have to write these articles with primary sources, but they hate those; see one of their favorite jargons, WP:PRIMARY. (Weng’s blog, ironically, might make the cut as a secondary source, despite containing pretty much just paraphrases or quotes from primary sources, but only because she’s an OA exec.) Which is a big part of why the DL articles all suck because there just aren’t many good secondary or tertiary sources like encyclopedias. (Well, there’s the Schmidhuber Scholarpedia articles in some cases, but aside from being outdated, it’s, well, Schmidhuber.) There is no GAN textbook I know of which is worthwhile, and I doubt ever will be.
I hate Schmimdhuber with a passion because I can smell everything he touches on Wikipedia and they are always terrible.
Sometimes when I read pages about AI, I see things that almost certainly came from him, or one of his fans. I struggle to speak of exactly what Schmidhuber’s kind of writing gives, but perhaps this will suffice: “People never give the right credit to anything. Everything of importance is either published by my research group first but miscredited to someone later, or something like that. Deep Learning? It’s done not by Hinton, but Amari, but not Amari, but by Ivanenkho. The more obscure the originator, the better, because it reveals how bad people are at credit assignment—if they were better at it, the real originators would not have been so obscure.”
For example, LSTM is actually originated by Schmidhuber… and actually, it’s also credited to Schmidhuber (… or maybe Hochreiter?). But then GAN should be credited to Schmidhuber, and also Transformers. Currently he (or his fans) kept trying to put the phrase “internal spotlights of attention” into the Transformer page, and I kept removing it. He wanted the credit so much that he went for argument-by-punning, renaming “fast weight programmer” to “linear transformers”, and to quote out of context “internal spotlights of attention” just to fortify the argument with a pun! I can do puns too! Rosenblatt (1962) even wrote about “back-propagating errors” in an MLP with a hidden layer. So what?
I actually took Schmidhuber’s claim seriously and carefully rewrote of Ivanenkho’s Group method of data handling, giving all the mathematical details, so that one may evaluate it for itself instead of Schmidhuber’s claim. A few months later someone manually reverted everything I wrote! What does it read like according to a partisan of Ivanenkho?
Well excuse me, “Godel’s incompleteness theorems”? “the original method”? Also, I thought “fuzzy” has stopped being fashionable since 1980s. I actually once tried to learn fuzzy logic and gave up after not seeing what is the big deal. It is filled with such pompous and self-important terminology, as if the lack of substance must be made up by the heights of spiritual exhortation. Why say “combined” when they could say “consists of a synthesis of ideas from different areas of science”?
As a side note, such turgid prose, filled with long noun-phrases is pretty common among the Soviets. I once read that this kind of massive noun-phrase had a political purpose, but I don’t remember what it is.
Aren’t most of the sources going to be journal articles? Academic papers are definitely fair game for citations (and generally make up most citations on Wikipedia).