gwern comments on [missing post]

gwern 17 Oct 2019 2:28 UTC
5 points
Certainly there are links which are regularly updated, like Wikipedia pages. They should be whitelisted. There are others which wouldn’t make any sense to archive, stuff like services or tools—something like Waifu Labs which I link in several places wouldn’t make much sense to ‘archive’ because the entire point is to interact with the service and generate images.

But examples like blogs or LW pages make sense to archive after a particular timepoint. For example, many blogs or websites like Reddit lock comments after a set number of days. Once that’s passed, typically nothing in the page will change substantially (for the better, anyway) except to be deleted. I think most of my links to blogs are of that type.

Even on LW, where threads can be necroed at any time, how often does anyone comment on an old post, and if your archived copy happens to omit some stray recent comments, how big a deal is that? Acceptable collateral damage compared to a website where 5 or 10% of links are broken and the percentage keeps increasing with time, I’d say...

For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
- ioannes 17 Oct 2019 3:24 UTC
  1 point
  Parent
  For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
  
  This makes sense, but it takes a lot of activation energy. I don’t think a practice like this will spread (like even I probably won’t chunk out the time to learn how to implement it, and I care a bunch about this stuff).
  Plausibly “(a)” could spread in some circles – activation energy is low and it only adds 10-20 seconds of friction per archived link.
  But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
  - gwern 17 Oct 2019 14:42 UTC
    4 points
    Parent
    
    But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
    
    If adoption is your only concern, doing it website by website is hopeless in the first place. Your only choice is creating some sort of web browser plugin to do it automatically.
    - andzuck 16 Feb 2020 20:52 UTC
      2 points
      Parent
      The script now exists: https://www.andzuck.com/projects/archivify/
    - ioannes 19 Jul 2020 19:44 UTC
      1 point
      Parent
      Update: Brave Browser now gives an option to search for archived versions whenever it lands on a “page does not exist”
    - ioannes 17 Oct 2019 20:03 UTC
      1 point
      Parent
      Not my only concern but definitely seems important. (Otherwise you’re constrained by what you can personally maintain.)
      A browser plugin seems like a good approach.