Thanks, this is great. (And I didn’t know about your Archiving URLs page!)
And the functionality is one that will be rarely exercised by users, who will click on only a few links and will click on the archived version for only a small subset of said links, unless link rot is a huge issue—in which case, why are you linking to the broken link at all instead of the working archived version?
I feel like I’m often publishing content with two audiences in mind – my present-tense audience and a future audience who may come across the post.
The original link feels important to include because it’s more helpful to the present-tense audience. e.g. Often folks update the content of a linked page in response to reactions elsewhere, and it’s good to be able quickly point to the latest version of the link.
The archived link is more aimed at the future audience. By the time they stumble across the post, the original link will likely be broken, and there’s a better chance that the archived version will still be intact. (e.g. many of the links on Aaron Swartz’s blog are now broken; whenever I read it I find myself wishing there were convenient archived versions of the links).
Certainly there are links which are regularly updated, like Wikipedia pages. They should be whitelisted. There are others which wouldn’t make any sense to archive, stuff like services or tools—something like Waifu Labs which I link in several places wouldn’t make much sense to ‘archive’ because the entire point is to interact with the service and generate images.
But examples like blogs or LW pages make sense to archive after a particular timepoint. For example, many blogs or websites like Reddit lock comments after a set number of days. Once that’s passed, typically nothing in the page will change substantially (for the better, anyway) except to be deleted. I think most of my links to blogs are of that type.
Even on LW, where threads can be necroed at any time, how often does anyone comment on an old post, and if your archived copy happens to omit some stray recent comments, how big a deal is that? Acceptable collateral damage compared to a website where 5 or 10% of links are broken and the percentage keeps increasing with time, I’d say...
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
This makes sense, but it takes a lot of activation energy. I don’t think a practice like this will spread (like even I probably won’t chunk out the time to learn how to implement it, and I care a bunch about this stuff).
Plausibly “(a)” could spread in some circles – activation energy is low and it only adds 10-20 seconds of friction per archived link.
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
If adoption is your only concern, doing it website by website is hopeless in the first place. Your only choice is creating some sort of web browser plugin to do it automatically.
Thanks, this is great. (And I didn’t know about your Archiving URLs page!)
I feel like I’m often publishing content with two audiences in mind – my present-tense audience and a future audience who may come across the post.
The original link feels important to include because it’s more helpful to the present-tense audience. e.g. Often folks update the content of a linked page in response to reactions elsewhere, and it’s good to be able quickly point to the latest version of the link.
The archived link is more aimed at the future audience. By the time they stumble across the post, the original link will likely be broken, and there’s a better chance that the archived version will still be intact. (e.g. many of the links on Aaron Swartz’s blog are now broken; whenever I read it I find myself wishing there were convenient archived versions of the links).
Certainly there are links which are regularly updated, like Wikipedia pages. They should be whitelisted. There are others which wouldn’t make any sense to archive, stuff like services or tools—something like Waifu Labs which I link in several places wouldn’t make much sense to ‘archive’ because the entire point is to interact with the service and generate images.
But examples like blogs or LW pages make sense to archive after a particular timepoint. For example, many blogs or websites like Reddit lock comments after a set number of days. Once that’s passed, typically nothing in the page will change substantially (for the better, anyway) except to be deleted. I think most of my links to blogs are of that type.
Even on LW, where threads can be necroed at any time, how often does anyone comment on an old post, and if your archived copy happens to omit some stray recent comments, how big a deal is that? Acceptable collateral damage compared to a website where 5 or 10% of links are broken and the percentage keeps increasing with time, I’d say...
For this issue, you could implement something like a ‘first seen’ timestamp in your link database and only create the final archive & substituting after a certain time period—I think a period like 3 months would capture 99% of the changes which are ever going to be made, while not risking exposing readers to too much linkrot.
This makes sense, but it takes a lot of activation energy. I don’t think a practice like this will spread (like even I probably won’t chunk out the time to learn how to implement it, and I care a bunch about this stuff).
Plausibly “(a)” could spread in some circles – activation energy is low and it only adds 10-20 seconds of friction per archived link.
But even “(a)” probably won’t spread far (10-20 seconds of friction per link is too much for almost everyone). Maybe there’s room for a company doing this as a service...
If adoption is your only concern, doing it website by website is hopeless in the first place. Your only choice is creating some sort of web browser plugin to do it automatically.
The script now exists: https://www.andzuck.com/projects/archivify/
Update: Brave Browser now gives an option to search for archived versions whenever it lands on a “page does not exist”
Not my only concern but definitely seems important. (Otherwise you’re constrained by what you can personally maintain.)
A browser plugin seems like a good approach.