Certainly there are links which are regularly updated, like Wikipedia pages. They should be whitelisted. There are others which wouldn’t make any sense to archive, stuff like services or tools—something like Waifu Labs which I link in several places wouldn’t make much sense to ‘archive’ because the entire point is to interact with the service and generate images.
But examples like blogs or LW pages make sense to archive after a particular timepoint. For example, many blogs or websites like Reddit lock comments after a set number of days. Once that’s passed, typically nothing in the page will change substantially (for the better, anyway) except to be deleted. I think most of my links to blogs are of that type.
Even on LW, where threads can be necroed at any time, how often does anyone comment on an old post, and if your archived copy happens to omit some stray recent comments, how big a deal is that? Acceptable collateral damage compared to a website where 5 or 10% of links are broken and the percentage keeps increasing with time, I’d say...
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
This makes sense, but it takes a lot of activation energy. I don’t think a practice like this will spread (like even I probably won’t chunk out the time to learn how to implement it, and I care a bunch about this stuff).
Plausibly “(a)” could spread in some circles – activation energy is low and it only adds 10-20 seconds of friction per archived link.
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
If adoption is your only concern, doing it website by website is hopeless in the first place. Your only choice is creating some sort of web browser plugin to do it automatically.
Certainly there are links which are regularly updated, like Wikipedia pages. They should be whitelisted. There are others which wouldn’t make any sense to archive, stuff like services or tools—something like Waifu Labs which I link in several places wouldn’t make much sense to ‘archive’ because the entire point is to interact with the service and generate images.
But examples like blogs or LW pages make sense to archive after a particular timepoint. For example, many blogs or websites like Reddit lock comments after a set number of days. Once that’s passed, typically nothing in the page will change substantially (for the better, anyway) except to be deleted. I think most of my links to blogs are of that type.
Even on LW, where threads can be necroed at any time, how often does anyone comment on an old post, and if your archived copy happens to omit some stray recent comments, how big a deal is that? Acceptable collateral damage compared to a website where 5 or 10% of links are broken and the percentage keeps increasing with time, I’d say...
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
This makes sense, but it takes a lot of activation energy. I don’t think a practice like this will spread (like even I probably won’t chunk out the time to learn how to implement it, and I care a bunch about this stuff).
Plausibly “(a)” could spread in some circles – activation energy is low and it only adds 10-20 seconds of friction per archived link.
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
If adoption is your only concern, doing it website by website is hopeless in the first place. Your only choice is creating some sort of web browser plugin to do it automatically.
The script now exists: https://www.andzuck.com/projects/archivify/
Update: Brave Browser now gives an option to search for archived versions whenever it lands on a “page does not exist”
Not my only concern but definitely seems important. (Otherwise you’re constrained by what you can personally maintain.)
A browser plugin seems like a good approach.