Print ready version of The Sequences

JordanNov 6, 2010, 1:21 AM

20 points

I’ve been wanting a printable copy of the sequences to read through in meatspace. I wrote a quick scraper and uploaded the results here http://pwnee.com/Sequences/list.html

Inter-linking doesn’t work, but I just wanted a printable version anyway.

What links here?

Sequences in Alternative Formats by OneWhoFrogs (Dec 4, 2010, 1:40 AM; 22 points)
Elec0's comment on How To Actually Change Your Mind eBook (In Order) by Elec0 (Sep 4, 2012, 10:45 PM; 0 points)

JordanNov 6, 2010, 1:21 AM

20 points

33 comments1 min readLW link Archive

Rationality A-Z (discussion & meta)

Paul Crowley Nov 20, 2010, 4:45 PM
9 points

I’ve now written a fairly sophisticated scraper for Eliezer’s blog posts based on lxml, which
- follows the Author links in “Article Navigation” to fetch all articles
- fetches and parses all articles
- identifies the title, body, and date
- fixes hrefs to internal references where possible, including where the reference is to Overcoming Bias and redirects back to Less Wrong.
- fixes all the weird Unicode characters as best I can where I can make a plausible guess
- finds and adds the forward references in all blog posts
- caches all network operations in a very simple dumb way
- writes them all out as very simple HTML with a very simple HTML contents page, in a form that Calibre works well on.
I’ll share the script when I have time to sort out publishing via Mercurial, or email me if you’d like a snapshot copy—paul at ciphergoth dot org.
- multifoliaterose Nov 20, 2010, 5:10 PM
  0 points
  Parent
  
  Great to hear!
Vladimir_Golovin Nov 6, 2010, 9:34 AM
9 points


I’ll throw together a python script to get the job done, and will post the result here.

LessWrong is written in Python, so instead of writing a scraper script (which I suppose you were going to do), you can become a volunteer, grab the code at GitHub, write a proper script that can access the database directly, and commit it into the LW codebase.
Jordan Nov 7, 2010, 12:56 AM
3 points

Hi all, thanks for the comments. I went ahead and wrote the scraper. I might do a proper PDF version, integrated into the LW codebase once I get some time.
- Vladimir_Golovin Nov 7, 2010, 6:18 AM
  3 points
  Parent
  
  I haven’t checked all the scraped sequences, but I saw at least one missing post: “Invisible Frameworks”, which should be at the end of the Metaethics sequence according to the wiki index.
- Document Jan 6, 2011, 8:00 AM
  0 points
  Parent
  
  Thanks; I’ve downloaded them for possible offline reading. I checked the post titles in How to Actually Change Your Mind against the wiki and found Politics is the Mind-Killer and the text of You Can Face Reality missing.
Document Nov 6, 2010, 7:26 PM
2 points

This page has posts from 2006 November 22 to 2008 March 13.
- Document Jan 6, 2011, 8:45 AM
  0 points
  Parent
  
  Also, the wiki links to lw2ebook by OneWhoFrogs, which has them in epub and mobi format.
imonroe Apr 15, 2012, 8:05 PM
1 point

Just FYI, the link above (http://pwnee.com/Sequences/list.html) currently 404′s.
jb55 Nov 23, 2010, 12:17 AM
1 point

I put together a repository for print-friendly versions of the Sequences. It consists of a pretty naive scraper which I feed into pandoc. It spits out markdown documents which I then convert to epub, pdf, etc.

I have the first two sequences completed and I plan on doing all of them eventually. Check it out:

http://jb55.com/lesswrong

https://github.com/jb55/lesswrong-print
NihilCredo Nov 7, 2010, 7:41 PM
1 point

There seems to be a relatively easy way to fix interlinking, at least within the same sequence:

1) Open the file with Notepad++ or any text editor of your choice.

2) Do a regex replacement to add an HTML anchor after every header, with the same anchor name as in the link in the header

3) Do a regex replacement to turn every link from /lw/tworandomcharacters/articlename into #articlename

I may get around to do it later.
- Document Jan 6, 2011, 8:27 AM
  0 points
  Parent
  
  Personally I’d rather have all the links point to lesswrong.com, so there’d be no broken ones and so the title links could be used to go to each post’s comment threads.
RHollerith Nov 6, 2010, 6:49 PM
1 point

Forgive my ignorance: has the author of the Sequences given permission for the distribution (e.g., over the public internet) of EPUBs and such of his work—as opposed to people making EPUBs or such for their own consumption, which is not considered by most people to be “distribution” of a work?
- jsalvatier Nov 6, 2010, 7:12 PM
  1 point
  Parent
  
  I haven’t seen him explicitly do so, but I imagine he would.
MichaelAnissimov Jul 6, 2012, 2:51 AM
0 points

This is currently broken… any chance of getting it fixed? Thanks for doing this!
benbenson Aug 1, 2011, 3:16 PM
0 points

Where is Eliezer’s Metaethics Sequence?
- MixedNuts Aug 1, 2011, 3:33 PM
  0 points
  Parent
  
  Wiki index
  
  Welcome to Less Wrong!
MichaelHoward Dec 22, 2010, 3:21 AM
0 points

This has actually gone public, without any request not to say anything, so I trust it’s OK to mention it to anyone who finds themselves here and doesn’t know yet. I know I’d want to know! :-)

Eliezer has completed the first draft of his rationality book based on his two-year sequence of blog posting on Less Wrong, packed with hundreds of pages of novel content.

The manuscript is over 280,000 words long (over 500 pages) and has been split into two sub-books. The next steps are thoroughly editing the text and moving towards publication.

Yay! :)
hwc Dec 16, 2010, 12:47 AM
0 points

One point about e-book versions of HTML files: with restricted screen space and slow page-turns, you want less space between paragraphs. The following CSS rule indents paragraph, print-style:

p { text-indent:2em; margin-top:0; margin-bottom:0; }
What links here?
- hwc's comment on Sequences in Alternative Formats by OneWhoFrogs (Jan 22, 2011, 1:39 AM; 1 point)
Paul Crowley Nov 10, 2010, 5:49 PM
0 points

http://pwnee.com/Sequences/GetSequences.py gives me “Internal Server Error”!
Larks Nov 6, 2010, 2:23 PM
0 points

Once Eliezer will finish the Rationality Books we’ll be able to use them. I think much of Mysterious Answers to Mysterious Questions, Reductionism, How to Actually Change Your Mind and Map and Territory will be contained; the core stuff.
Relsqui Nov 6, 2010, 6:37 AM
0 points

Now you’ve got me wondering what it would take to make an ereader-ready file out of them.
- Vladimir_Golovin Nov 6, 2010, 7:59 AM
  9 points
  Parent
  
  The most important decision is the format. PDF support is pretty ubiquitous, but it’s not reflowable (i.e. can’t adapt to different screen sizes and user-adjusted fonts automatically), which looks bad on devices that have screens of different size than the size encoded in the PDF. Many devices implement some form of PDF reflow, which may work well for simple-layout PDFs (with no pullquotes or columns).
  
  There’s EPUB, a good reflowable open-source XML-based format, but it has its downsides too: not all ereaders support it, and, as I gather from the wikipedia page, the situation with linking is unclear—which is important because the sequences are heavily interlinked.
  
  There’s also CHM (Windows help file format) which is both reflowable and linkable since it’s based on HTML/CSS, but, as far as I know, few ereaders support it.
  
  Based on my own experience, I strongly prefer EPUB. However I have no experience with its linking support.
  
  I’d first try a linked EPUB and see how well its linking works on popular devices, or, as a second option, try a simple, sinlge-column, no-pullquotes PDF encoded for a 6-inch screen and see how well it reflows on various readers.
  - humpolec Nov 6, 2010, 10:14 AM
    1 point
    Parent
    
    PDFs are pretty much write-only, and in my experience (with Adobe Acrobat-based devices) reflow never works very well. As long as you use a sane text-based ebook format, Calibre can handle conversion to other formats.
    
    So I recommend converting into—if not EPUB, then maybe just a clean HTML (with all the links retained—readers that support HTML should have no problems with links between file sections).
    - Vladimir_Golovin Nov 6, 2010, 4:01 PM
      2 points
      Parent
      
      
      then maybe just a clean HTML
      
      Yes, some readers (e.g. Pocketbooks) can handle HTML, but even the latest Sony readers cannot. Kindle does have HTML support “via conversion” but I don’t know if it can correctly convert 600 or so interlinked documents.
      - humpolec Nov 6, 2010, 5:33 PM
        2 points
        Parent
        
        
        600 or so interlinked documents
        
        I was thinking more of a single, 600-chapter document.
        
        (Actually this is why I think Sequences are best read on a computer, with multiple tabs open, like TVTropes or Wikipedia—not on an e-reader. I wonder how Eliezer’s book will turn out...)
  - Jordan Nov 6, 2010, 8:20 AM
    0 points
    Parent
    
    Thanks for the info!
    
    I suppose for PDFs we could just have a list of various widths available.
    - Vladimir_Golovin Nov 6, 2010, 9:25 AM
      2 points
      Parent
      
      Yes, but you’ll still have to change the page size and recompile each of the 600 or so PDFs—this is a huge amount of work if you’re planning to do it manually. Generating PDFs for different screen sizes automatically (e.g. using a Python library) would be a better solution. I’m not familiar with PDF generation under Python, but I did a quick Google search and here’s what I found:
      
      http://www.devshed.com/c/a/Python/Python-for-PDF-Generation/
      http://www.reportlab.com/software/opensource/
- Risto_Saarelma Nov 8, 2010, 9:59 AM
  1 point
  Parent
  
  I’ve had success turning texts in various formats into EPUB files using Calibre.
  
  ETA: Tried Calibre for making an ebook of the scrapings. Downloaded each sequence file, plus the index html, into the same directory. Modified index to point to local files instead of fully qualified URLs. Pointed Calibre at the index file, and it slurped the actual sequence files nicely. Resulting epub looks fine, corresponds to a 1300 page physical book and includes the inline images.
  - Risto_Saarelma Nov 10, 2010, 9:12 PM
    2 points
    Parent
    
    Made a script to do this.
    
    Install the prerequisites: apt-get install wget calibre tidy, or whatever works on your favorite OS.
    
    Then do
    
    #!/bin/bash
    
    TMPDIR=$(mktemp -d)
    pushd $TMPDIR
    wget -H -np -nd -k -p -r -l 1 http://pwnee.com/Sequences/list.html
    tidy -m *.html
    ebook-convert list.html lesswrong.epub
    popd
    cp $TMPDIR/lesswrong.epub .
- Jordan Nov 6, 2010, 7:30 AM
  0 points
  Parent
  
  Once you’ve harvested the text it would be straightforward to make a PDF, but I’m not sure how many ereaders support that format.
  - CarlShulman Nov 6, 2010, 8:42 AM
    0 points
    Parent
    
    You can convert a pdf for kindle reading automatically, but the formatting winds up a bit spotty.
  - Relsqui Nov 6, 2010, 7:55 AM
    0 points
    Parent
    
    Me neither. Pretty sure the Kindle does but I’m not really familiar with the field, as I don’t own one. (But I’d like to, which is why I was curious.)