gwern comments on Brainstorming: neat stuff we could do on LessWrong

gwern 17 Dec 2010 1:08 UTC
2 points
We could probably repurpose some of the Archive Binge software, although it might need work to reproduce the ‘original spacing’. (Not that I’m convinced that’s very useful. 1 every X days sounds better to me.)
- pjeby 17 Dec 2010 23:54 UTC
  0 points
  Parent
  
  it might need work to reproduce the ‘original spacing’
  
  Actually, it’s easier to keep the original spacing, because then all you need is a database of the posts and their original dates, and some very simple math to do the query. To do “1 every X days” means you have to fake the dates, use serial numbers, or some other such rubbish in order to find which items to put in the feed.
  - gwern 18 Dec 2010 0:33 UTC
    0 points
    Parent
    Easier? Hm?
    
    I have a list of postings sans dates. Every X days cron runs and the head of the list is popped off into the RSS feed.
    - pjeby 19 Dec 2010 4:40 UTC
      0 points
      Parent
      
      I have a list of postings sans dates. Every X days cron runs and the head of the list is popped off into the RSS feed.
      
      I have a list of postings with dates. Whenever somebody tries to read an RSS feed, I return the entries within the appropriate time window.
      
      IOW, my approach doesn’t store any server-side state. All the state is in the feed URL (specifying the start date). The query is something like:
      
      SELECT (original_post_date-first_post_date+feed_url_date), title, etc. FROM posts WHERE original_post_date<(now()-feed_url_date+first_post_date) ORDER BY original_post_date DESC LIMIT size_of_feed -- a constant, like 20
      Et voila. No cron. No “list”. No “feed” to have things “popped into”. If ten thousand people subscribe, there is no additional data added to a database or written to disk anywhere. And since the database is read-only, you can replicate and load-balance the service to your heart’s content.
      
      In addition, my approach can be trivially extended to use an etag or a last-modified date that contains the date of the next post, and then avoid doing the query at all if that date hasn’t been reached yet. (Most RSS clients support sending back an ETag or If-modified-since header containing the information from the last query, so that they can skip reparsing—and this would allow the system to simply say, “nah, nothing’s changed” and not re-run the query.)
      
      And it’s still scalable via replication—you can have as many clones running as you want, and they’ll all answer the same thing about the given feed URL (within the accuracy of their clock synchronization, of course).
      
      Et voila.
      
      Actually, this approach is so simple that you don’t even need a real SQL database—Google App Engine’s simple database API would suffice. Heck, the “database” itself is probably small enough to be embedded entirely within the source code, if you did a titles-only feed. ;-)
- wedrifid 17 Dec 2010 2:18 UTC
  0 points
  Parent
  Thanks for the link. I’ve been looking for something that could do that sort of thing. Now to see if there is something that works for things other than comics...