Risto_Saarelma comments on Open Thread for February 3 − 10

Risto_Saarelma 5 Feb 2014 7:01 UTC
3 points
If you’re comfortable with command-line UIs, git-annex is worth a look for creating repositories of large static files (music, photos, pdfs) you sync between several computers.

I use regular git for pretty much anything I create myself, since I get mirroring and backups from it. Though it’s mostly text, not audio or video. Large files that you change a lot probably need a different backup solution. I’ve been trying out Obnam as an actual backup system. Also bought an account at an off-site shell provider that also provides space for backups.

Use the same naming scheme for your reference article names and the BibTeX identifiers for them, if you’re writing up some academic research.

GdMap or WinDirStat are great for getting a visualization of what’s taking space on a drive.

If your computer ever gets stolen, you probably want it to have had a full-disk encryption. That way it’s only a financial loss, and probably not a digital security breach.

It constantly fascinates me that you can name the exact contents of a file pretty much unambiguously with something like a SHA256 hash of it, but I haven’t found much actual use for this yet. I keep envisioning schemes where your last-resort backup of your media archive is just a list of file names and content hashes, and if you lose your copies you can just use a cloud service to retrieve new files with those hashes. (These of course need to be files that you can reasonably assume other people will have bit-to-bit equal copies of.) Unfortunately, there don’t seem to be very robust and comprehensive hash-based search and download engines yet.
- Strange7 9 Feb 2014 4:26 UTC
  0 points
  Parent
  
  I keep envisioning schemes where your last-resort backup of your media archive is just a list of file names and content hashes, and if you lose your copies you can just use a cloud service to retrieve new files with those hashes.
  
  Suggest it to the folks who run The Pirate Bay.
  - Risto_Saarelma 9 Feb 2014 7:41 UTC
    0 points
    Parent
    They probably know about it already. I think the eDonkey network is pretty much what I envision. The problem is that the network needs to be very comprehensive and long-lived to be a reliable solution that can be actually expected to find someone’s copy of most of the obscure downloads you want to hang on to, and things that people try to sue into oblivion whenever they get too big have trouble being either. There’s also the matter of agreeing on the hash function to use, since hash functions come with a shelf-life. A system made in the 90s that uses the MD5 function might be vulnerable nowadays to a bot attack substituting garbage for the known hashes using hash collision attacks. (eDonkey uses MD4, which seems to be similarly vulnerable to MD5.)
    
    There’s an entire field called named data networking that deals with similar ideas.
    
    There probably are parts of the problem that are cultural instead of technical though. People aren’t in the mindset of wanting to have their media archive as a tiny hash metadata master list with the actual files treated as cached representations, so there isn’t demand and network effect potential for a widely used system accomplishing that.
    - trist 9 Feb 2014 12:42 UTC
      4 points
      Parent
      Zooko did this: Tahoe-LAFS
      
      You can safely use it for private files too, just don’t lose your preencryption hashes.
- btrettel 5 Feb 2014 18:53 UTC
  0 points
  Parent
  Great suggestions.
  
  Use the same naming scheme for your reference article names and the BibTeX identifiers for them, if you’re writing up some academic research.
  
  This is very smart, and I’ll look into changing my bibiligraphy files appropriately.
  
  If your computer ever gets stolen, you probably want it to have had a full-disk encryption. That way it’s only a financial loss, and probably not a digital security breach.
  
  I want to reiterate the importance of this. I’ve used full-disk encryption for years for the security advantages, and I’ve found the disadvantages to be pretty negligible. The worst problem with it that I’ve had was trying to chroot into my computer, but you just have to mount everything manually. Not a big deal once you know how to do it.