Unfortunately, there is no standard way to make parts of page disappear from search engines’ indexes. Which is super annoying, because almost every page contains some navigational parts which do not contribute to the content.
HTML 5 contains a semantic tag which defines navigational links in the document. I think a smart search engine should exclude these parts, but I have no idea if any engine actually does that. Maybe changing LW pages to HTML 5 and adding this tag would help.
Some search engines use specific syntax to exclude parts of the page, but it depends on the engine, and sometimes it even violates the HTML standards. For example, Google uses HTML comments … , Yahoo uses HTML attribute class=”robots-nocontent”, and Yandex introduces a new tag . (I like the Yahoo way most.)
The most standards-following way seems to be putting the offending parts of the page into separate HTML pages which are included by , and use the standard robots.txt mechanism to block those HTML pages. I think the disadvantage is that the included frames will have fixed dimensions, instead of changing dynamically with their content. Another solution would be to insert those texts by JavaScript, which means that users with JavaScript disabled would not see them.
Another solution would be to insert those texts by JavaScript, which means that users with JavaScript disabled would not see them.
They’re already inserted by javascript. E.g. the ‘recent comments’ one works by fetching http://lesswrong.com/api/side_comments and inserting its contents directly in the page.
Editing robots.txt might exclude those parts from the google index, but idk.
Unfortunately, there is no standard way to make parts of page disappear from search engines’ indexes. Which is super annoying, because almost every page contains some navigational parts which do not contribute to the content.
HTML 5 contains a semantic tag which defines navigational links in the document. I think a smart search engine should exclude these parts, but I have no idea if any engine actually does that. Maybe changing LW pages to HTML 5 and adding this tag would help.
Some search engines use specific syntax to exclude parts of the page, but it depends on the engine, and sometimes it even violates the HTML standards. For example, Google uses HTML comments … , Yahoo uses HTML attribute class=”robots-nocontent”, and Yandex introduces a new tag . (I like the Yahoo way most.)
The most standards-following way seems to be putting the offending parts of the page into separate HTML pages which are included by , and use the standard robots.txt mechanism to block those HTML pages. I think the disadvantage is that the included frames will have fixed dimensions, instead of changing dynamically with their content. Another solution would be to insert those texts by JavaScript, which means that users with JavaScript disabled would not see them.
Since our local search is powered by Google, I’m content with a solution that only works for Google.
They’re already inserted by javascript. E.g. the ‘recent comments’ one works by fetching http://lesswrong.com/api/side_comments and inserting its contents directly in the page.
Editing robots.txt might exclude those parts from the google index, but idk.
I think robots.txt would work.