tags present in HTML trigger this bug; there is an issue on LW bug tracker, but developers say it can’t be easily fixed, so we are left with having to manually remove the offending tag each time it slips in.
tags present in HTML trigger this bug, there is an issue on LW bug tracker, but developers say it can’t be easily fixed, except by manually getting rid of the offending tag.
Not easily fixed? Nonsense. Not elegantly fixed perhaps. It can be easily fixed with, by way of proof of concept, regex and a cron job.
Quite. To fix the more heavily misformatted posts, I use a script that in particular does just that for the
tags (replacing them with
), but sometimes it garbles/discards nonstandard ways of meaningfully marking up the text and needs to be corrected on case-to-case basis. Writing a script that automatically works in general would be harder, require more expertise with peculiarities of HTML, and ideally a test suite to make sure that it doesn’t disrupt useful markup. (I fail to see where a cron job might enter the picture.)
(I fail to see where a cron job might enter the picture.)
Trike has knowledge about the codebase that I do not. It is theoretically possible (albeit unlikely) that the code is constructed such making the correction take place at an appropriate time does not qualify as ‘easy’. However information that is available is that Vladimir_Nesov is able to log in and make such corrections via a html interface. To establish with confidence that correction (within the limits of the side effects you mentioned) would be easy without relying on any assumptions about the inner workings of the codebase it is simple to consider simply running a regexing script remotely using Vladimir_Nesov credentials. This establishes an upper bound.
OK. In any case, that doesn’t work, because of the mentioned difficulty with creating an automatic script that doesn’t damage the text sometimes (it’s generally irresponsible to edit a public archive without manually testing the results when probability of damage is nontrivial), though a script that alerts about the presence of
’s could ensure that corrections happen more reliably.
Fixed.
Not easily fixed? Nonsense. Not elegantly fixed perhaps. It can be easily fixed with, by way of proof of concept, regex and a cron job.
Quite. To fix the more heavily misformatted posts, I use a script that in particular does just that for the
), but sometimes it garbles/discards nonstandard ways of meaningfully marking up the text and needs to be corrected on case-to-case basis. Writing a script that automatically works in general would be harder, require more expertise with peculiarities of HTML, and ideally a test suite to make sure that it doesn’t disrupt useful markup. (I fail to see where a cron job might enter the picture.)
Trike has knowledge about the codebase that I do not. It is theoretically possible (albeit unlikely) that the code is constructed such making the correction take place at an appropriate time does not qualify as ‘easy’. However information that is available is that Vladimir_Nesov is able to log in and make such corrections via a html interface. To establish with confidence that correction (within the limits of the side effects you mentioned) would be easy without relying on any assumptions about the inner workings of the codebase it is simple to consider simply running a regexing script remotely using Vladimir_Nesov credentials. This establishes an upper bound.
OK. In any case, that doesn’t work, because of the mentioned difficulty with creating an automatic script that doesn’t damage the text sometimes (it’s generally irresponsible to edit a public archive without manually testing the results when probability of damage is nontrivial), though a script that alerts about the presence of
Thanks! I see it didn’t revert when I edited the post, which surprises me.