I don’t have 1000 karma, but I’d like to ask something:
Are you going to keep backups of the original LW1.0 database (preferably multiple copies/locations)? I’m worried that some of the content might not have made it through the conversion to LW2.0. So far, I’ve noticed that old polls (both the questions and the results) are missing, and there might be other little things like that even if the main content is preserved.
I’m tempted to use wget or something like that to download all pages off LW1.0, but I don’t have much memory and it could take a long time. If I knew that you were going to retain all the original data for preservation purposes, I wouldn’t be as worried.
Are you going to keep backups of the original LW1.0 database (preferably multiple copies/locations)?
We plan to. It’s not obvious that those backups will be public by default—we receive database dumps from Trike, which contain both public content like comments and private content like private messages and email addresses—but if there’s interest it doesn’t seem too difficult to create a sanitized version of the database.
It would be amazing if you published a sanitized version of the LW 1.0 DB. In addition to ensuring the content is preserved, that would also make it easier for people to do interesting statistical analyses.
This is a concern for me too. A suggestion I made in feedback: Don’t break inbound links. Keep the old site, static, under archive.lesswrong.com or something, and redirect classic-format url paths to the archive.
There is a lot of valuable material on the classic site. It might not be useful for current discussion, but let’s not lose it, or let it get buried on archive.org.
(come to think of it, if maintaining an archive is itself unworkable, a redirect to archive.org might be an acceptable next-best alternative)
We tried pretty hard to not break any incoming links, and have been watching the google analytics for the old site to make sure we covered all the inbound links.
I wouldn’t trust a redirect to archive.org, because some of the content might have randomly been missed by the Wayback Machine crawler or the last crawled version might be missing comments that were added later. It also might have systematically missed certain things, such as deeply nested comment chains where you have to click “continue this thread” and posts with a lot of replies where you have to click “load more comments” (which is even less likely to be preserved, as it relies on AJAX).
I don’t have 1000 karma, but I’d like to ask something:
Are you going to keep backups of the original LW1.0 database (preferably multiple copies/locations)? I’m worried that some of the content might not have made it through the conversion to LW2.0. So far, I’ve noticed that old polls (both the questions and the results) are missing, and there might be other little things like that even if the main content is preserved.
I’m tempted to use wget or something like that to download all pages off LW1.0, but I don’t have much memory and it could take a long time. If I knew that you were going to retain all the original data for preservation purposes, I wouldn’t be as worried.
We plan to. It’s not obvious that those backups will be public by default—we receive database dumps from Trike, which contain both public content like comments and private content like private messages and email addresses—but if there’s interest it doesn’t seem too difficult to create a sanitized version of the database.
Yep, I think we should definitely keep a few copies of the db dumps we have, and have them saved on a few different machines.
It would be amazing if you published a sanitized version of the LW 1.0 DB. In addition to ensuring the content is preserved, that would also make it easier for people to do interesting statistical analyses.
This is a concern for me too. A suggestion I made in feedback: Don’t break inbound links. Keep the old site, static, under archive.lesswrong.com or something, and redirect classic-format url paths to the archive.
There is a lot of valuable material on the classic site. It might not be useful for current discussion, but let’s not lose it, or let it get buried on archive.org.
(come to think of it, if maintaining an archive is itself unworkable, a redirect to archive.org might be an acceptable next-best alternative)
I think I’ve said this before, but the most important feature of a redesign has always been “don’t break inbound links.”
We tried pretty hard to not break any incoming links, and have been watching the google analytics for the old site to make sure we covered all the inbound links.
Okay, cool. As long as it’s on your radar.
I wouldn’t trust a redirect to archive.org, because some of the content might have randomly been missed by the Wayback Machine crawler or the last crawled version might be missing comments that were added later. It also might have systematically missed certain things, such as deeply nested comment chains where you have to click “continue this thread” and posts with a lot of replies where you have to click “load more comments” (which is even less likely to be preserved, as it relies on AJAX).