gwern comments on Subscripts for Probabilities

gwern 14 Apr 2023 15:44 UTC
13 points
9

e.g., if you do them with Unicode subscripts then I think searching for “80%” will not find an “80%” subscript, because those are different characters

Browsers unify many characters for search purposes (or strip them out), but it looks like Unicode sub/superscripts are sometimes but not always considered equivalent. You can test this out in your own browser by going to https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block and doing C-f ‘2’ or ‘3’ or something. I get no hits in Firefox, but I do in Chromium. (And even if your browser does, what about all your other tools? Stuff like grep sure won’t treat them as equivalent without a lot of work. Or they will get stripped out, or turn into mojibake, or...) So, not something you can count on.

Aside from weirdness like that, I also think that the Unicode sub/superscript characters tend to look jarring and out-of-place. I don’t know if the fonts are bad, or they omit it & the fallback is bad, or if they are ‘typographically correct’ but we are so unfamiliar with ‘proper’ sub/superscript compared to HTML ones that they look wrong to us, or what. There are many places where Unicode works well for fancier typography, but between the omission of many letters*, breaking tools, and bad appearance, the Unicode sub/superscripts are a bad solution if you have anything better available.

I’d consider using them only if I was restricted to pure UTF-8 text, with nothing else. (For example, a link tooltip. Or maybe a machine-learning context where the model can’t handle HTML formatting.)

* Which you’d want for… a lot of things. For example, you could write with Unicode subscripts ‘Foo 2023a’, but not ‘Foo 2023b’. Because there’s a subscript ‘a’ but not a subscript b’ (or ‘c’, or ‘d’). Yeah, I know. So if you absolutely insist on Unicode subscripts, now you need a new way to disambiguate, like ‘Foo 2023-1’ vs ‘Foo 2023-2’ or something.