It isn’t obvious to me why this is better than e.g. what at the end you quote Fischer 2023 as doing, which (1) feels to me less like a special convention that might need explaining and (2) works fine without needing to be able to write subscripts and without running into gotchas related to how subscripts are implemented (e.g., if you do them with Unicode subscripts then I think searching for “80%” will not find an “80%” subscript, because those are different characters).
What advantage do you see to using subscripts that outweighs those factors?
e.g., if you do them with Unicode subscripts then I think searching for “80%” will not find an “80%” subscript, because those are different characters
Browsers unify many characters for search purposes (or strip them out), but it looks like Unicode sub/superscripts are sometimes but not always considered equivalent. You can test this out in your own browser by going to https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block and doing C-f ‘2’ or ‘3’ or something. I get no hits in Firefox, but I do in Chromium. (And even if your browser does, what about all your other tools? Stuff like grep sure won’t treat them as equivalent without a lot of work. Or they will get stripped out, or turn into mojibake, or...) So, not something you can count on.
Aside from weirdness like that, I also think that the Unicode sub/superscript characters tend to look jarring and out-of-place. I don’t know if the fonts are bad, or they omit it & the fallback is bad, or if they are ‘typographically correct’ but we are so unfamiliar with ‘proper’ sub/superscript compared to HTML ones that they look wrong to us, or what. There are many places where Unicode works well for fancier typography, but between the omission of many letters*, breaking tools, and bad appearance, the Unicode sub/superscripts are a bad solution if you have anything better available.
I’d consider using them only if I was restricted to pure UTF-8 text, with nothing else. (For example, a link tooltip. Or maybe a machine-learning context where the model can’t handle HTML formatting.)
* Which you’d want for… a lot of things. For example, you could write with Unicode subscripts ‘Foo 2023a’, but not ‘Foo 2023b’. Because there’s a subscript ‘a’ but not a subscript b’ (or ‘c’, or ‘d’). Yeah, I know. So if you absolutely insist on Unicode subscripts, now you need a new way to disambiguate, like ‘Foo 2023-1’ vs ‘Foo 2023-2’ or something.
I hadn’t thought about the issue with searching, that’s a pretty good counterargument. (I am not able to search for the probabilities in this document either, because the LATEX isn’t searchable :-/)
Ultimately it comes down to an aesthetic preference for me: I will use these because they look kind of neat. But perhaps applying the reversal test to something like footnotes is interesting here: Imagine one was always writing “more specialized predators have bigger prey (see footnote 3)” instead of “more specialized predators have bigger prey³”. The latter is more compact, but not searchable.
Obviously there are switching costs associated with this. But perhaps the compactness that’s an advantage for footnotes is a similar advantage here, that’s why I’m trying it out.
That still has search problems! Consider: “see footnotes 3, 9, and 11–13”. How do you search for any of those 4 footnotes? The natural language approach is inherently ambiguous for such a hypertext problem which requires some formal support.
(The real solution there is footnote backlinks, like we have on Gwern.net: you can search for all references—site-wide, too—to a footnote by simply going to the footnote in question. If you’re not up to that, then a lightweight HTML approach would be to simply wrap each footnote number in a span and hide the text from display, but not search, so C-f ‘footnote 3’ would always hit the “prey³” construct.)
It isn’t obvious to me why this is better than e.g. what at the end you quote Fischer 2023 as doing, which (1) feels to me less like a special convention that might need explaining and (2) works fine without needing to be able to write subscripts and without running into gotchas related to how subscripts are implemented (e.g., if you do them with Unicode subscripts then I think searching for “80%” will not find an “80%” subscript, because those are different characters).
What advantage do you see to using subscripts that outweighs those factors?
Browsers unify many characters for search purposes (or strip them out), but it looks like Unicode sub/superscripts are sometimes but not always considered equivalent. You can test this out in your own browser by going to https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block and doing C-f ‘2’ or ‘3’ or something. I get no hits in Firefox, but I do in Chromium. (And even if your browser does, what about all your other tools? Stuff like
grep
sure won’t treat them as equivalent without a lot of work. Or they will get stripped out, or turn into mojibake, or...) So, not something you can count on.Aside from weirdness like that, I also think that the Unicode sub/superscript characters tend to look jarring and out-of-place. I don’t know if the fonts are bad, or they omit it & the fallback is bad, or if they are ‘typographically correct’ but we are so unfamiliar with ‘proper’ sub/superscript compared to HTML ones that they look wrong to us, or what. There are many places where Unicode works well for fancier typography, but between the omission of many letters*, breaking tools, and bad appearance, the Unicode sub/superscripts are a bad solution if you have anything better available.
I’d consider using them only if I was restricted to pure UTF-8 text, with nothing else. (For example, a link tooltip. Or maybe a machine-learning context where the model can’t handle HTML formatting.)
* Which you’d want for… a lot of things. For example, you could write with Unicode subscripts ‘Foo 2023a’, but not ‘Foo 2023b’. Because there’s a subscript ‘a’ but not a subscript b’ (or ‘c’, or ‘d’). Yeah, I know. So if you absolutely insist on Unicode subscripts, now you need a new way to disambiguate, like ‘Foo 2023-1’ vs ‘Foo 2023-2’ or something.
I hadn’t thought about the issue with searching, that’s a pretty good counterargument. (I am not able to search for the probabilities in this document either, because the LATEX isn’t searchable :-/)
Ultimately it comes down to an aesthetic preference for me: I will use these because they look kind of neat. But perhaps applying the reversal test to something like footnotes is interesting here: Imagine one was always writing “more specialized predators have bigger prey (see footnote 3)” instead of “more specialized predators have bigger prey³”. The latter is more compact, but not searchable.
Obviously there are switching costs associated with this. But perhaps the compactness that’s an advantage for footnotes is a similar advantage here, that’s why I’m trying it out.
That still has search problems! Consider: “see footnotes 3, 9, and 11–13”. How do you search for any of those 4 footnotes? The natural language approach is inherently ambiguous for such a hypertext problem which requires some formal support.
(The real solution there is footnote backlinks, like we have on Gwern.net: you can search for all references—site-wide, too—to a footnote by simply going to the footnote in question. If you’re not up to that, then a lightweight HTML approach would be to simply wrap each footnote number in a span and hide the text from display, but not search, so C-f ‘footnote 3’ would always hit the “prey³” construct.)