Al Truist comments on Leaving beta: Voting on moving to LessWrong.com

Al Truist 12 Mar 2018 4:16 UTC
6 points
0
I don’t have 1000 karma, but I’d like to ask something:
Are you going to keep backups of the original LW1.0 database (preferably multiple copies/locations)? I’m worried that some of the content might not have made it through the conversion to LW2.0. So far, I’ve noticed that old polls (both the questions and the results) are missing, and there might be other little things like that even if the main content is preserved.
I’m tempted to use wget or something like that to download all pages off LW1.0, but I don’t have much memory and it could take a long time. If I knew that you were going to retain all the original data for preservation purposes, I wouldn’t be as worried.
- Vaniver 12 Mar 2018 4:20 UTC
  5 points
  0
  Parent
  Are you going to keep backups of the original LW1.0 database (preferably multiple copies/locations)?
  We plan to. It’s not obvious that those backups will be public by default—we receive database dumps from Trike, which contain both public content like comments and private content like private messages and email addresses—but if there’s interest it doesn’t seem too difficult to create a sanitized version of the database.
  - Al Truist 12 Mar 2018 18:10 UTC
    5 points
    0
    Parent
    It would be amazing if you published a sanitized version of the LW 1.0 DB. In addition to ensuring the content is preserved, that would also make it easier for people to do interesting statistical analyses.
  - habryka 12 Mar 2018 4:24 UTC
    5 points
    0
    Parent
    Yep, I think we should definitely keep a few copies of the db dumps we have, and have them saved on a few different machines.
- Error 12 Mar 2018 16:49 UTC
  4 points
  0
  Parent
  This is a concern for me too. A suggestion I made in feedback: Don’t break inbound links. Keep the old site, static, under archive.lesswrong.com or something, and redirect classic-format url paths to the archive.
  
  There is a lot of valuable material on the classic site. It might not be useful for current discussion, but let’s not lose it, or let it get buried on archive.org.
  
  (come to think of it, if maintaining an archive is itself unworkable, a redirect to archive.org might be an acceptable next-best alternative)
  - Vaniver 12 Mar 2018 20:26 UTC
    12 points
    0
    Parent
    Don’t break inbound links.
    I think I’ve said this before, but the most important feature of a redesign has always been “don’t break inbound links.”
  - habryka 12 Mar 2018 18:23 UTC
    6 points
    0
    Parent
    We tried pretty hard to not break any incoming links, and have been watching the google analytics for the old site to make sure we covered all the inbound links.
    - Error 12 Mar 2018 20:21 UTC
      5 points
      0
      Parent
      Okay, cool. As long as it’s on your radar.
  - Al Truist 12 Mar 2018 18:07 UTC
    3 points
    0
    Parent
    I wouldn’t trust a redirect to archive.org, because some of the content might have randomly been missed by the Wayback Machine crawler or the last crawled version might be missing comments that were added later. It also might have systematically missed certain things, such as deeply nested comment chains where you have to click “continue this thread” and posts with a lot of replies where you have to click “load more comments” (which is even less likely to be preserved, as it relies on AJAX).