gwern comments on Open Thread March 31 - April 7 2014

gwern 2 Apr 2014 17:43 UTC
3 points
You can do what I do: http://www.gwern.net/Archiving%20URLs

High startup cost, but on the plus side, you don’t need to do anything once it’s running and it’ll catch most of what you read.
- gjm 3 Apr 2014 13:53 UTC
  3 points
  Parent
  OT (except that I ran into this while visiting that page): That Beeline thing is really annoying. (I got the “blues” variant. I was annoyed enough by it that I modified the cookie to serve me a different version. I asked it for variant 3 (“gray1″) and actually it doesn’t appear to be doing anything; maybe that’s a bug somewhere. Anyway, my apologies if this introduces noise into your A/B/.../I testing.)
  - gwern 3 Apr 2014 14:41 UTC
    0 points
    Parent
    ‘Blues’ is actually the best-performing variant so far! I have no idea why, I hate it too. If it succeeds, I’ll probably have to run another to try to find a version I can live with. ‘gray1’ is, IIRC, probably the subtlest of the running versions, so unless you set up a second identical tab set to ‘none’ and flicker back & forth, I suspect you simply weren’t noticing.
    
    EDIT: ‘Blues’ eventually succumbed, and the final result was no version clearly outperformed no-BLR at all. See http://www.gwern.net/AB%20testing#beeline-reader-text-highlighting
    - gjm 3 Apr 2014 17:05 UTC
      4 points
      Parent
      Does the metric you’re using (fraction of visitors staying at least N seconds?) actually measure what you care about? (A few possible confounding factors, off the top of my head: visitors may be intrigued by the weird colours and stay around while they try to work out what it is, but this doesn’t indicate that they got any actual value from the page content; if the Beeline thing works, visitors may find the one bit of information they’re looking for faster and then leave; if it’s just annoying, annoyance may show up in reduced repeat visits rather than likelihood of disappearing quickly.)
      - gwern 3 Apr 2014 17:33 UTC
        0 points
        Parent
        I think it’s a reasonable metric. It’s not perfect (I’d rather measure average time on page, not a cutoff), but I don’t know how to do any better: I am neither a Javascript programmer nor a Google Analytics expert.
- Lumifer 2 Apr 2014 18:01 UTC
  0 points
  Parent
  Do you have problems with searching for needed information in that mass of data that you archive locally?
  - gwern 2 Apr 2014 18:17 UTC
    0 points
    Parent
    Not really. When you start with a URL (my usual use-case), it’s very easy to look in the local archive for it.
    - Lumifer 2 Apr 2014 18:29 UTC
      0 points
      Parent
      Ah, so you have something like an ancillary indexing system with URLs?
      - gwern 2 Apr 2014 18:56 UTC
        0 points
        Parent
        URLs map onto filenames (that’s what they originally were), so when wget downloads a URL, it’s generally fairly predictable where the contents will be located on disk.
        Lumifer 2 Apr 2014 19:08 UTC
        2 points
        Parent
        No, that’s not what I mean. Let’s say you want to look up studies on, say, the effect of dietary sodium on CVD and you have a vague recollection that you scanned a paper on the topic a year or so ago. I understand that if you have the URL of this paper you can easily find it on your disk, but how do you go from, basically, a set of search terms to the right URL?
        gwern 2 Apr 2014 21:44 UTC
        2 points
        Parent
        Oh. In that sort of scenario, I depend on my Evernote, having included it on gwern.net/Google+/LW/Reddit, and my excellent search skills. Generally speaking, if I remember enough exact text to make grepping my local WWW archive a feasible search strategy, it’s trivial to locate it in Google or one of the others.
        
        Lumifer 3 Apr 2014 16:04 UTC
        0 points
        Parent
        Ah, I see. So your system is less of a knowledge base and more of a local backup of particularly interesting parts of the ’net.
        
        Thanks :-)
        gwern 3 Apr 2014 18:24 UTC
        0 points
        Parent
        Yes, it’s the last resort for URLs which are broken. It’s not much good having a snippet from a web page so you know you want to check it, if the web page no longer exists.