An element that is entirely bold counts as a heading
So, I have a problem with this. My problem is that this is deeply antithetical to best practices of semantic HTML and web standards in general.
Now, my understanding is that the motivation for this behavior is that many people find it easier and more intuitive, in the Less Wrong editor UI, to bold some text than to figure out how to make a proper heading. This is understandable. However, if this is indeed the reason, then let me propose a solution:
When someone makes an all-bold element in the LW editor, convert that element into an actual heading when saving the post or comment. (Use whatever heading level is the lowest available level in the given editing context—post or comment.)
That way, the HTML content can better conform to standards and best practices, and the ToC features on both LW and GW can stick to using only actual HTML headings as sections.
Converting a bold-only paragraph to an HTML heading in the editor seems like a decent improvement to me. I do think that there are some constraints that we have that make stuff like that difficult, and which cause me to think that we do want to have the “interpret bold paragraph as heading” feature activated (while the HTLM we save in our editor would be semantically different):
Most of the content on LW is historical content, where editor features like that weren’t available. I want to make sure the ToC works well for that content, but I feel hesitant to edit all the HTML of those posts, in case that does change the semantic meaning of the text in some cases (obviously we introduced some of that confusion already with the ToC, but I think editing the old content feels a bit more violating than that).
I think users that are used to Markdown will often use single bold words as heading, and I feel hesitant to deviate too much from the standard Markdown conventions of how you should parse Markdown into HTML.
Some content is submitted to us via RSS feeds from people’s blogs, where we obviously have no control over their HTML, and I would also prefer to not modify it.
I strongly endorse editing the HTML of old LW posts.
I’ve looked at that HTML, and almost invariably been horrified; it’s often an absolute mess. I don’t think that changing the semantics of the post is a serious danger, for three reasons:
It’s usually quite clear what it should be.
The way it’s displayed currently is already not the same as the way it was displayed back then, and there’s no remedy for that; better to make it right and proper going forward, than to compound old mistakes and the effects of an awkward transition.
You can (and should) archive the un-corrected version, in case there’s a need to revert anything.
I think users that are used to Markdown will often use single bold words as heading
Really? I haven’t seen this… but if it’s true, it shouldn’t be encouraged!
Some content is submitted to us via RSS feeds from people’s blogs, where we obviously have no control over their HTML, and I would also prefer to not modify it.
This is understandable, but in general I think it’s better to do it right in the majority of cases and to either make an exception or to allow imperfect outcomes in a minority of cases, than to do it wrong in the majority of cases for the sake of consistency. (How much of this RSS’d content is there, anyway?)
Some aspects of the conversion can, and should, be scripted. Some should be manual.
There is no reason to go through each post. Prioritize them: if someone comments on an old post, clearly it’s active, so edit that. If a post is getting a lot of traffic, edit that. Otherwise, leave it alone until there’s a reason to edit it, or you have time to do so.
(Also: crowdsource this stuff. Let trusted volunteers submit edited HTML, then drop it in as a replacement if it’s good.)
Most of the content on LW is historical content, where editor features like that weren’t available. I want to make sure the ToC works well for that content, but I feel hesitant to edit all the HTML of those posts, in case that does change the semantic meaning of the text in some cases (obviously we introduced some of that confusion already with the ToC, but I think editing the old content feels a bit more violating than that).
Also, I just want to point out that one obvious compromise would be to both treat all-bold elements as headings (for compatibility with as-yet-un-updated old content), and convert all-bold elements in newly-created content (that is written with the Draft.js editor) to proper headings. That way, you would not be creating any new standards-violating HTML, while not being under any pressure to edit old content, and having the ToC work as expected for said old content.
I think users that are used to Markdown will often use single bold words as heading, and I feel hesitant to deviate too much from the standard Markdown conventions of how you should parse Markdown into HTML.
Don’t know where you got this notion from, but absolutely not. Markdown has syntax that’s used for headings, and I’ve never used bolded text as a replacement for a proper heading.
(As a wider point, Said Achmiz is as usual correct in his approach and it would be much appreciated if you didn’t inflict any more appalling HTML practices on API consumers)
We just serve the historical HTML for practical all posts, and all new HTML is really as straightforward HTML as you can imagine (with some exception for blockquotes, which we currently split into block-level elements, though that will be fixed soon). Happy to hear about any other problems you have with the HTML, but I am not aware of any.
Just because markdown has a heading syntax, doesn’t mean that everyone follows it, and depending on context you might not want to follow it. I literally googled “Markdown bold” and among the first few results this tutorial uses bolded headers as an example.
I literally googled “Markdown bold” and among the first few results this tutorial uses bolded headers as an example
Huh? I just went to that link; I don’t see where it says to use (or uses) bold-as-header.
Edit:
Just because markdown has a heading syntax, doesn’t mean that everyone follows it, and depending on context you might not want to follow it.
That not everyone follows it is, clearly, technically true (though I don’t at all share your impression of this practice’s prevalence). But I’m curious to know why you would ever not want to follow the heading syntax, if what you want to produce is a heading? (Other than cases of “this particular parser and/or renderer exhibits bizarre behavior, so I unfortunately have to produce non-standard Markdown in order for what I write to look sane when displayed”—but that, obviously, does not apply here, since we’re talking about determining what the parser and renderer do!)
I’m curious to know why you would ever not want to follow the heading syntax, if what you want to produce is a heading?
I’ve sometimes used regular bold for headings, I think mostly because it’s lower friction. I don’t need to think about what level of heading I should use semantically, or how that level actually renders.
But that’s not a heading, nor would it be correct to treat it as such! (Note that lower down on the page you linked, the tutorial specifies how to make actual headings!)
Oh, I think if a user enter that text into a text editor, they would prefer it to show up in the ToC rather than not. Or at the very least have the option to add it to a ToC (though I think if they had to choose, most users would prefer to add it).
I think that if a user is thinking about these sorts of things at all, then they can, should, and do have the capacity to make an actual heading element. (And if this is at all difficult or unintuitive in the UI, then that is the flaw that needs to be rectified!)
I’m the one who implemented this. While it’s partially about making it intuitive in LessWrong’s GUI editor, the decision was mostly based on trying it out on historical posts, and seeing what seemed to work best. All of that content pre-dates our current ToC implementation, and some of it was imported from quite far back in time. (This is also why <h5> and <h6> aren’t considered headings.)
While we could write a migration script and also modify the editor to save as headers, we’re reluctant to do that because it could change the semantics of old imported content, and we’re reluctant to invest in LessWrong’s Draft-JS editor right now, since we’re planning on replacing it with something better.
So, I have a problem with this. My problem is that this is deeply antithetical to best practices of semantic HTML and web standards in general.
Now, my understanding is that the motivation for this behavior is that many people find it easier and more intuitive, in the Less Wrong editor UI, to bold some text than to figure out how to make a proper heading. This is understandable. However, if this is indeed the reason, then let me propose a solution:
When someone makes an all-bold element in the LW editor, convert that element into an actual heading when saving the post or comment. (Use whatever heading level is the lowest available level in the given editing context—post or comment.)
That way, the HTML content can better conform to standards and best practices, and the ToC features on both LW and GW can stick to using only actual HTML headings as sections.
Converting a bold-only paragraph to an HTML heading in the editor seems like a decent improvement to me. I do think that there are some constraints that we have that make stuff like that difficult, and which cause me to think that we do want to have the “interpret bold paragraph as heading” feature activated (while the HTLM we save in our editor would be semantically different):
Most of the content on LW is historical content, where editor features like that weren’t available. I want to make sure the ToC works well for that content, but I feel hesitant to edit all the HTML of those posts, in case that does change the semantic meaning of the text in some cases (obviously we introduced some of that confusion already with the ToC, but I think editing the old content feels a bit more violating than that).
I think users that are used to Markdown will often use single bold words as heading, and I feel hesitant to deviate too much from the standard Markdown conventions of how you should parse Markdown into HTML.
Some content is submitted to us via RSS feeds from people’s blogs, where we obviously have no control over their HTML, and I would also prefer to not modify it.
I strongly endorse editing the HTML of old LW posts.
I’ve looked at that HTML, and almost invariably been horrified; it’s often an absolute mess. I don’t think that changing the semantics of the post is a serious danger, for three reasons:
It’s usually quite clear what it should be.
The way it’s displayed currently is already not the same as the way it was displayed back then, and there’s no remedy for that; better to make it right and proper going forward, than to compound old mistakes and the effects of an awkward transition.
You can (and should) archive the un-corrected version, in case there’s a need to revert anything.
Really? I haven’t seen this… but if it’s true, it shouldn’t be encouraged!
This is understandable, but in general I think it’s better to do it right in the majority of cases and to either make an exception or to allow imperfect outcomes in a minority of cases, than to do it wrong in the majority of cases for the sake of consistency. (How much of this RSS’d content is there, anyway?)
Are you imagining a manual process where you look at each post and edit it? I was assuming Oliver had in mind an automated script.
Would you expect it to be easy to script the conversion?
Some aspects of the conversion can, and should, be scripted. Some should be manual.
There is no reason to go through each post. Prioritize them: if someone comments on an old post, clearly it’s active, so edit that. If a post is getting a lot of traffic, edit that. Otherwise, leave it alone until there’s a reason to edit it, or you have time to do so.
(Also: crowdsource this stuff. Let trusted volunteers submit edited HTML, then drop it in as a replacement if it’s good.)
Also, I just want to point out that one obvious compromise would be to both treat all-bold elements as headings (for compatibility with as-yet-un-updated old content), and convert all-bold elements in newly-created content (that is written with the Draft.js editor) to proper headings. That way, you would not be creating any new standards-violating HTML, while not being under any pressure to edit old content, and having the ToC work as expected for said old content.
Oh, yeah. That’s what I meant to say above. Adding that behavior to our editor seems relatively low-cost.
Don’t know where you got this notion from, but absolutely not. Markdown has syntax that’s used for headings, and I’ve never used bolded text as a replacement for a proper heading.
(As a wider point, Said Achmiz is as usual correct in his approach and it would be much appreciated if you didn’t inflict any more appalling HTML practices on API consumers)
We just serve the historical HTML for practical all posts, and all new HTML is really as straightforward HTML as you can imagine (with some exception for blockquotes, which we currently split into block-level elements, though that will be fixed soon). Happy to hear about any other problems you have with the HTML, but I am not aware of any.
Just because markdown has a heading syntax, doesn’t mean that everyone follows it, and depending on context you might not want to follow it. I literally googled “Markdown bold” and among the first few results this tutorial uses bolded headers as an example.
Huh? I just went to that link; I don’t see where it says to use (or uses) bold-as-header.
Edit:
That not everyone follows it is, clearly, technically true (though I don’t at all share your impression of this practice’s prevalence). But I’m curious to know why you would ever not want to follow the heading syntax, if what you want to produce is a heading? (Other than cases of “this particular parser and/or renderer exhibits bizarre behavior, so I unfortunately have to produce non-standard Markdown in order for what I write to look sane when displayed”—but that, obviously, does not apply here, since we’re talking about determining what the parser and renderer do!)
I’ve sometimes used regular bold for headings, I think mostly because it’s lower friction. I don’t need to think about what level of heading I should use semantically, or how that level actually renders.
Oops, sorry. Wrong link. I mean this one, together with this screenshot:
But that’s not a heading, nor would it be correct to treat it as such! (Note that lower down on the page you linked, the tutorial specifies how to make actual headings!)
Oh, I think if a user enter that text into a text editor, they would prefer it to show up in the ToC rather than not. Or at the very least have the option to add it to a ToC (though I think if they had to choose, most users would prefer to add it).
I think that if a user is thinking about these sorts of things at all, then they can, should, and do have the capacity to make an actual heading element. (And if this is at all difficult or unintuitive in the UI, then that is the flaw that needs to be rectified!)
I’m the one who implemented this. While it’s partially about making it intuitive in LessWrong’s GUI editor, the decision was mostly based on trying it out on historical posts, and seeing what seemed to work best. All of that content pre-dates our current ToC implementation, and some of it was imported from quite far back in time. (This is also why
<h5>
and<h6>
aren’t considered headings.)While we could write a migration script and also modify the editor to save as headers, we’re reluctant to do that because it could change the semantics of old imported content, and we’re reluctant to invest in LessWrong’s Draft-JS editor right now, since we’re planning on replacing it with something better.