When I write posts I use raw HTML. Yes, the modern thing to do is probably Markdown, but HTML was designed for hand-coding and still works well for that if you don’t want anything especially fancy. But what if you want math?
Previously when I’ve wanted to do math I’ve written it out as fixed-width ASCII:
e^(-7t)
In my editor this looks like:
<pre> e^(-7t) </pre>
This is reasonably readable, works anywhere, and I like the aesthetic. I probably should have stuck with it, but after helping publish a report that included some traditionally-formatted equations and learning that MathML has been supported cross-browser since the beginning of the year (thanks Igalia!), I decided to try it out. I wrote the equations in two recent posts in it, and am mixed on the experience.
It definitely does look nicer:
On the other hand, here’s how it looks in my editor:
<math display=block> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mn>7</mn> <mi>t</mi> </mrow> </msup> </math>
There’s a small learning curve on when to use the different tags, but
mostly it’s just very verbose. And I think, needlessly so? That
“-
” is an operator, “7
” is a number, and
“t
” is an identifier could all be the default. Then I
could just write:
<math display=block> <msup> e <mrow> −7t </mrow> </msup> </math>
And we could remove many uses of <mrow>
too: a
series of characters without whitespace separating them could
be already treated as a group:
<math display=block> <msup> e −7t </msup> </math>
Of course if you wanted to use a character for a non-traditional purpose you could still mark it up as one, but a good set of defaults would make MathML much more pleasant. I’d hate to have to read and write blog posts as:
<word><lt>h</lt><lt>e</lt><lt>l</lt><lt>l</lt><lt>o</lt></word> <word><lt>w</lt><lt>o</lt><lt>r</lt><lt>l</lt><lt>d</lt></word> <pnct>.</pnct>
I know I’m about 25 years too late on this, and I’m happy that a pure-HTML solution is now cross-browser, but it’s still sad we ended up so close to a comfortable hand-editable solution.
(Just use MathJax? Nope—I don’t want a runtime dependency on JS. Though I could see including a LaTeX-to-MathML or a MathML-verbosifier step at build time.)
Two updates:
I went and coded a verbosifier, so it’s possible to write
<msup>a 2</msup> + <msup>b 2</msup> = <msup>c 2</msup>
and not<msup><mi>a</mi><mn>2</mn></msup><mo>+</mo>...
: https://github.com/jeffkaufman/mathml_verbosifierIt turns out Feedly (and maybe other RSS readers?) strip out MathML, so even if they’re using a rendering engine that supports it you see nothing. So I’ll stick to ascii math for a while longer.
FYI, when cross-posted to Less Wrong, that math in your post gets rendered with MathJax, not browser-native MathML. (This is the case both on LW itself and on GreaterWrong.)
And it’s a good thing, too, because this is what I see in the original post, on your website:
It seems that Chrome only supports MathML starting with Chrome 109, which is not compatible with my version of macOS.
Also, caniuse.com reports the following:
(The “see details” link goes to a discussion on the blink-dev mailing list, which is… less than straightforwardly informative to an ordinary web developer, much less an ordinary web user. I’m still not sure what the differences are. But it’s not exactly reassuring.)
That’s not a great setup to be running: there have been several serious vulnerabilities since then, including the WebP zero-day.
Pretty sure there’s nothing I would be interested in hand-coding that’s outside of MathML Core.
Sure. The larger point is that (again, as per caniuse.com) MathML support is available to only 90% of users globally. That’s somewhat less than ideal, if you want your site to cater to a diverse user base.
In general I don’t think it makes sense for site owners to make changes to support users who are running dangerous configurations, and skimming caniuse it looks to me like this 10% is almost all people running very old versions of browsers.
I also suspect that a lot of what looks like people running really old browsers is actually bots, since it’s common to make a bot that emulates whatever the current version is at the time you made it (often because it’s based on that browser version and they’re not getting around to updating, or because they hardcode a UA and don’t prioritize updating).
That’s a lot of readers to throw away, and if you go to 95%, it isn’t that limiting, especially with various kinds of backwards compatibility.
Caniuse isn’t reporting raw numbers that bots could trivially inflate, but using Statcounter’s statistics, which claims to screen for bots. (How successful they are at this is unknowable, of course.)
Depends on how popular you are. Even if you make the highly questionable assumption that browser statistics collected on sites like cnn.com and such are representative of the readership of jefftk.com, if jefftk.com has hundreds of readers, he’s still doing a lot of work for a group that can only manage to claim that there are “dozens of us”, and in any case really ought to upgrade to a proper browser (and in probably most cases, OS) anyway, for security reasons.
I wouldn’t be ok serving pages that didn’t work for 10% of my visitors, but I’d be really surprised if the number is really that high.
You are losing >10% if you deliberately break it for 10%. Breakage is the union of all breakages. Like, your site is already on thin ice: you have nonsense like
margin-left: em;
in your CSS (how large is that margin, exactly?) and<li>
list items which are… not… inside any<ol>
/<ul>
lists? (Almost like a Zen koan. If list items don’t need to be inside a list, isn’t everything a list item, in some sense...) Also, I have no idea what<script nonce="this-is-not-a-real-nonce" type="text/javascript">
is supposed to do, but it very much makes me wonder if it’s doing what it’s supposed to do for anyone at all.Given the hell that is web dev, even if you have immaculate HTML/CSS and carefully code to the standards, you will still run into hilarious breakage for many users, particularly the many mobile and/or Mac users. And remember: silence is not golden, because on the contemporary Internet, “they’ll never tell you”. (I have had literally half a million people go to a website which was broken, and not a single one sent in a report or comment.) Everything has to be tested, and not taken for granted, like, say, simply assuming that pasting MathML would work out-of-the-box because it worked for you and you hadn’t heard otherwise from readers...
I don’t disagree, but none of the things you pointed out are actually breakage as far as I can tell:
That was a typo for
margin-left: 1em
, but the browser ignoring the directive doesn’t actually do anything because it only ever appears immediately to then right of something that hasmargin-right: 1em
. Fixed!Looks like at some point I missed the
<ul>
; added. (This is already only semantic—I have css removing all the list-specific display already)The validator is complaining because
type="text/javascript"
is no longer something you need to write, but it’s not really wrong to include it.The
nonce="this-is-not-a-real-nonce"
is something I added when I temporarily served my site with a CSP (but without taking the time to fully set it up) as part of verifying that some other code I was testing on my site did the right thing in the presence of a CSP. It’s not doing anything, but also not breaking anything. This is annoying enough to rip out that I’m leaving it for now.As long as you verify that you’re coding to a standard that’s supported by the versions of the browsers you’re trying to support, what sort of breakage are you thinking about? This does happen (ex: Chrome/iOS advertising in its
Accept
header that it supported webp when it didn’t support inline webp) but it’s pretty rare, especially in the last ~5y.For my own site I normally approach this by testing in multiple browser engines: Chrome + Firefox, sometimes also Safari. When I worked in this area professionally I additionally used careful A/B tests, but that’s not worth it for my personal site.
I didn’t say they were. If you are ‘skating on thin ice’, you have by definition not fallen through and started to drown, because you can’t skate and drown simultaneously. (At least, I can’t.) My point is that you are engaged in sloppy coding practices, and so it’s unsurprising that you are making mistakes like casually assuming that MathML can be copied around or would be compatible with random web applications, when you should know that the default assumption is that MathML will be broken everywhere and must be proven supported. That Internet math support is parlous is nothing new.
Until, of course, it doesn’t, because you refactored or something, and hit a spot of particularly thin ice.
Not at all. (My site has a few instances of unnecessary
type
declarations not worth ripping out.) I merely quoted that for the nonce part, which did concern me. CSP is one of the most arcane and frustrating areas of web dev, and the less one has to do with it, the better. Leaving in anything to do with CSRF or CSP or framejacking is indeed tempting fate.Web dev is crack & AIDS. We run into problems all the time where we code to a standard and then it breaks in Chrome or Firefox.
The day before yesterday I discovered that when I added dropcaps to my essay on why cats knock things over, it looked fine in Chrome… and bad in Firefox, because they define ‘first letter’ differently for the opening word ‘Q-tips’. (Firefox includes the hyphen in the “first letter”, so the hyphen was getting blown up to the size of the drop cap!) My solution was to put a space and write it ‘Q -tips’. Because we live in a world without a just and loving god and where standards exist to be honored in the breach.
Especially in Safari, which was created by a fallen demiurge in a twisted mockery of real browsers. Yesterday, Said had to fix a Safari-specific bug where the toggle bar breaks & vanishes on Safari. Worked fine everywhere else, coded against the standard… He also had to polyfill the standardized
crypto.randomUUID
(2021) for iOS.And today Said removed the CSS-standardized-and-deployed-since-at-least-2015 property
box-decoration-break
and-webkit-box-decoration-break
from Gwern.net because it breaks in Safari. (‘webkit’ = ‘Safari’, for the non-web-devs reading this. Yes, that’s right, the Safari version breaks in Safari, on top of the standardized version breaking in Safari for which the Safari version was supposed to be the fix. Good job, Apple! Maybe you can fix that after you get around to fixing your Gill Sans which renders everything written in it full of random typos? And then make your browser hyphenation not suck?) He also had to removehanging-punctuation
due to its interaction with the link text-shadows on Safari, but arguably link text-shadows are a hack whichhanging-punctuation
shouldn’t try to play well with, so might be our fault.I look forward to tomorrow. (That was sarcasm. If every day were like this, I would instead look forward to the sweet release of death.)
Have you heard the good news about server-side MathJax rendering?
Quoting gwern on the subject:
That’s a lot more complexity than I want to be maintaining in my publishing pipeline.
I’m also not excited about requiring external fonts.
The complexity has been quite minimal. You npm install one executable, which you run on a HTML file in place, and it’s done. After the npm install, it’s fairly hassle-free after that; you don’t even need to host the webfonts if you don’t want to. We chose to for some additional speed. (It’s not the size, but the latency: an equation here or there will pull in a few fonts which aren’t that big, but the loading of a new domain and reflow take time.) IIRC, over the, I dunno, 6 years that I’ve been using it, there has only been 1 actual bug due to mathjax-node-page: it broke a link in the navbox at the end of pages because the link had no anchor text (AFAICT), which I solved by just sticking in a ZERO WIDTH SPACE. All my other work related to it has been minor optimizations like rehosting the fonts, stripping a bit of unnecessary CSS, adding an optimization setting, etc. Considering how complicated this feature is, that’s quite impressive reliability. Many much simpler features, which deliver far less value, screw up far more regularly than the static MathJax compilation feature does.
What do you mean by “external” fonts? Are you referring to webfonts, in general?
If you don’t use webfonts at all, your website is very unlikely to ever look particularly good (much less to look good consistently across platforms)…
Yes, I don’t use any webfonts. I’m happy with the default fonts across platforms, don’t care whether my site looks consistent across platforms, and don’t want the performance penalty of requiring each visitor to load a new font to view my page.
Daring Fireball, a site you’ve probably heard of, seems to do OK with only browser-supplied fonts:
Also, jefftk said “requiring”. Sure, he could have a site that uses Inter, either loaded from his own site or from a CDN like Google Fonts, but if Inter doesn’t load (mostly likely because of user preference), then everything will be fine.
If TeX fonts don’t load…then what happens? Does the user see raw TeX, or nothing at all, or…?
Daring Fireball is a site one has primarily heard of for being an Apple/Mac shill, so perhaps not the best example of a website relying on OS-supplied fonts...
Daring Fireball also uses:
"Gill Sans MT", "Gill Sans", "Gill Sans Std", Georgia, serif
Because of this, and what you quoted, a page that, on a Mac, looks like this:
Daring Fireball page, as seen on a Mac
on a Linux, looks like this:
Daring Fireball page, as seen on a Linux
i.e., it looks bad.
And that is what happens when you don’t use webfonts.
The user sees the rendered equations, set in whatever font is inherited by the equation element (most likely, the font of the surrounding text block). This might be fine:
(Or, it could be very bad. You never know!)
For what it’s worth I think the Linux screenshot is fine—that’s the default font on that system.
I have no trouble believing it, but that speaks more about Linux’s generally sloppy and incompetent approach to typography than it does about whether leaving your website to the whims of OS-provided fonts has good results or not…
I don’t know—when I used Linux on my main machine I was happy with how things looked and generally preferred sites and programs that fit in with the rest of the environment. And Linux users are disproportionately the kind of people who, if they don’t like their system’s default font, will pick something they prefer.
This should be something GPT excels at: https://chat.openai.com/share/7e19d5e1-1a17-484c-a2ea-e0d7d2cfd56b If your editor supports gpt plugins
You can also use GPT to convert LaTeX to HTML/Unicode, incidentally. For simple inline expressions, this is very good. Like, there is not actually a need to use LaTeX or MathML to render
<em>e</em><sup><em>i</em>π</sup>
. That works fine in HTML+Unicode, and winds up looking better than an obtrusive MathML/LaTeX block, where even something as simple as$1$
winds up looking visibly alien and inserted.Speaking of MathML are there other ways for one to put mathematical formulas into html? I know Wikipedia uses <math> and its own template {{math}} (here’s the help page), but I’m not sure about any others. There’s also LaTeX (which I think is the best program for putting mathematical formulas into text in general), as well as some other bespoke things in Google Docs and Microsoft Word that I don’t quite understand.
In terms of what browsers support, MathML is the best way to do it in a modern browser. In an older browser you could do canvas, images, or something with custom fonts.
Most users, though, are in authoring environments that offer something else, usually a way to write LaTeX-style math and have it automatically converted into something the browser can handle.
If LW would suddenly change so that math could be saved for reading at a later time when I’m not connected to the internet, the amount thinking I do about math would probably suddenly triple.
Details: the main way I save text from web for reading at a later time is by copying a part of the web page, then pasting into a file on my local machine. That does not work for most text containing math.
The way the static approach on GreaterWrong & gwern.net works is that the original LaTeX is stored alongside the CSS/font/HTML stuff you actually see. Then when you copy-paste, instead of getting a bunch of gibberish letters sans formatting, a little bit of Javascript swaps out the gibberish for the accompanying LaTeX.
So for example, if I go to a random recent page with LaTeX in it, like https://www.greaterwrong.com/posts/wR8CFTasFpfCQZKKn/if-influence-functions-are-not-approximating-leave-one-out , and I copy-paste the first complicated-yet-abstractly-beautiful math expression, I get:
LOO(\hat x,\hat y) = \text{argmin}_\theta \frac 1 N\sum_{(x,y)\sim D-\{(\hat x,\hat y)\}}L(f_\theta(x),y)
in my Emacs text buffer. This is what the author originally wrote, so it’s as lossless as it gets, and if you are able to understand what it means, you presumably already know how to read the LaTeX version, and your text editor can render it or whatever else you need to do with it. I haven’t seen any better solutions.(Whereas for OP, written in MathML, I get
e−7t
on LW, andEquation
on GW. Hypothetically, they could try to decompile or interpret it as LaTeX, but needless to say, they do not. And even if they copy-pasted it as MathML—what destination programs would support MathML? Very few, I imagine.)There is a bug in GW around the functionality you describe: navigate to an article posted today, namely,
https://www.lesswrong.com/posts/JCgs7jGEvritqFLfR/evaluating-hidden-directions-on-the-utility-dataset
Then use the mouse to select the equation that occurs right after “the projection (aka the scalar product)”
When you paste that equation (I tried 2 programs, Emacs and gnome-text-editor, as the destination of the paste operation), you get
P(x_i) =
--with the right-hand side of the equation completely missing.
I can’t replicate this with my Ubuntu Linux/MATE/Firefox/Emacs setup. I get the whole equation no matter how I copy it.
(Note that there is one catch to the JS copy-paste listener: confusingly to contemporary users, X.org has multiple copy-paste buffers, ‘primary’ / ‘secondary’ / ‘copypaste’, of which browsers will apparently only allow web page JS to affect the first one. Since the browser doesn’t cooperate, this cannot be fixed by the webpage. So if you copy-paste in X.org, depending on how you do it, you may get the intended
P(xi)=<xi,v>
or you may get that newline-after-every-character version that jefftk quotes. If you are unsure what is going on, you can investigate using thexclip
utility, likexclip -o -selection copypaste
vsxclip -o -selection primary
.)Hmm, I can’t replicate this bug on GreaterWrong. Could you please say what browser/version/platform you are using?
Also, do other equations on other posts work?
Chrome downloaded from Google, running on Fedora 38 using the standard graphical environment (Gnome on Wayland).
Firefox works correctly.
>Also, do other equations on other posts work?
5 other instances of LaTex (some paragraph equations, some not) on 3 other posts work.
Which version of Chrome, please? (You can find this out by putting
chrome://version
into your URL bar.)Hmm, so it is just that one specific post, and the equations in that one post copy-paste incorrectly, while the equations in every other post you’ve tried copy-paste correctly? Is that right?
Chrome reports as 117.0.5938.92 (Official Build) (64-bit).
I already described the problem with the first paragraph equation (display equation) on the page.
The second paragraph equation, which can be located by searching for “log-likelihood”, also has the problem. In particular, it copies as
\text{PPL}(X) = \exp\left(-\frac{1}{n}\sum_i^n \log p_\theta(x_i|x_{
The third one, locatable via “concept vector v”, works correctly:
P_{\perp}(x_i) = x_i - \frac{}{||v||^2}v\,.
There is no fourth paragraph equation on the page.
Let me know if you want me to continue to search for instances of the bug, on other pages.
Alright, thank you.
I’ll try to figure out what might be causing this, though I can’t promise it’ll be soon, unfortunately.
Copying that equation from LW with Chrome on Mac, anything I paste it into (pbpaste, standard website, Google Docs) I get:
But when I use the GW version I get:
P(x_i) = <x_i, v>
Did you mean to link to the LW version of the post?
Great! My being given some way to obtain the original LaTeX as written by the author is the solution I have been tending to imagine over the years when I imagined what might be the best realistically-achievable way to change LW to accommodate my current workflow!
Thanks for pointing it out!
BTW, I’d like to learn more about the workflows of people who work with math all day every day.
MathML should copy fine, as long as the destination program supports it. What program are you pasting it into?
I’ve been pasting into Emacs. If you’re a Linux user, I would be interested to know what program you paste math into. Or if the thing you paste into “uses web tech” (and consequently is independent of OS), tell me which web site or program it is.
After shooting my mouth off I went and tried it, and even programs that I would expect to handle it well (ex: Google Docs) didn’t. Sorry!
(I think my statement is still literally true, except for the problem that ~no destination program currently supports it)