What are you working on? January 2014
Happy new year! This is the supposedly-bimonthly-but-we-keep-skipping ‘What are you working On?’ thread. Previous threads are here. So here’s the question:
What are you working on?
Here are some guidelines:
Focus on projects that you have recently made progress on, not projects that you’re thinking about doing but haven’t started.
Why this project and not others? Mention reasons why you’re doing the project and/or why others should contribute to your project (if applicable).
Talk about your goals for the project.
Any kind of project is fair game: personal improvement, research project, art project, whatever.
Link to your work if it’s linkable.
I’m currently a post-doc doing language technology/NLP type stuff. I’m considering quitting soon to work full time on a start-up. I’m working on three things at the moment.
The start-up is a language learning web app: http://www.cloze.it . What sets it apart from other language-learning software is my knowledge of linguistics, proficiency with text processing, and willingness to code detailed language-specific features. Most tools want to be as language neutral as possible, which limits their scope a lot. So they tend to all have the same set of features, centred around learning basic vocab.
Something that’s always bugged me about being an academic is, we’re terrible at communicating to people outside our field. This means that whenever I see a post using an NLP tool, they’re using a crap tool. So I wrote a blog post explaining a simple POS tagger that was better than the stuff in e.g. nltk (nltk is crap): http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/ The POS tagger post has gotten over 15k views (mostly from reddit), so I’m writing a follow up about a concise parser implementation. The parser is 500 lines, including the tagger, and faster and more accurate than the Stanford parser (the Stanford parser is also crap).
I’m doing minor revisions for a journal article on parsing conversational speech transcripts, and detecting disfluent words. The system gets good results when run on text transcripts. The goal is to allow speech recognition systems to produce better transcripts, with punctuation added, and stutters etc removed. I’m also working on a follow up paper to that one, with further experiments.
Overall the research is going well, and I find it very engaging. But I’m at the point where I have to start writing grant applications, and selling software seems like a much better expected-value bet.
Why is that, do you think? This doesn’t seem to be the case in the ML community as far as I can judge (though I’m not an expert). What’s special about NLP? What prevents the nltk people from doing what you did?
In ML, everyone is engaging with the academics, and the academics are doing a great job of making that accessible, e.g. through MOOCs. ML is one of the most popular targets of “ongoing education”, because it’s popped up and it’s a useful feather to have in your cap. It extends the range of programs you can write greatly. Many people realise that, and are doing what it takes to learn. So even if there are some rough spots in the curriculum, the learners are motivated, and the job gets done.
The cousin of language processing is computer vision. The problem we have as academics is that there is a need to communicate current best-of-breed solutions to software engineers, while we also communicate underlying principles to our students and to each other.
If you look at nltk, it’s really a tool for teaching our grad students. And yet it’s become a software engineering tool-of-choice, when it should never have been billed as industrial strength at all. Check out the results in my blog post:
NLTK POS tagger: 94% accuracy, 236s
My tagger: 96.8% accuracy, 12s
Both are pure Python implementations. I do no special tricks; I just keep things tight and simple, and don’t pay costs from integrating into a large framework.
The problem is that the NLTK tagger is part of a complicated class hierarchy that includes a dictionary-lookup tagger, etc. These are useful systems to explain the problem to a grad student, but shouldn’t be given to a software engineer who wants to get something done.
There’s no reason why we can’t have a software package that just gets it done. Which is why I’m writing one :). The key difference is that I’ll be shipping one POS tagger, one parser, etc. The best one! If another algorithm comes out on top, I’ll rip out the old one and put the current best one in.
That’s the real difference between ML and NLP or computer vision. In NLP, we really really should be telling people, “just use this one”. In ML, we need to describe a toolbox.
TIL: NLP can mean Natural Language Processing, as well as Neuro Linguistic Programming. I was confused for a while there.
This is about personal improvement. My wife and I have parted in september and that is the major life change I have been ‘working on’ with all my rationality. Incidentally most changes by that are complete now.
So this is a good point to announce and pre-commit on writing a post on dealing rationally with a major life crisis (as I believe I did). I planned and started to write that post but I am not sure how to present the case esp. without giving too much private details.
Maybe you can help:
Is there interest in such an account? What kind of interest?
Are there any existing posts on dealing with a major life crisis on LW?
What questions would you like to see answered by such a post?
Would you like to discuss this topic in private?
Would you recommend such a post for Main?
Note that I don’t need counseling and gladly are psychologically healthy despite being the one who was left with four children by my wife for another one. I also don’t need recommendations or further readings as I researched the topic sufficiently in depth during the hard times (I will provide some references on this in my post).
For the curious: The solution I orchestrated is that my wife and her new significant other now live in a house we bought (mortgaged) over the street and the children live in both houses (since today). And we negotiated a (notary) marriage contract that spells out the terms of the whole thing.
Interestingly I hit on LW not long before the breakup (no: LW had no part in that) and it did help to deal with the situation sanely (maybe you can find some hints in my posts from that time; I got much strength from things we did right esp. for the children).
I will post the writeup on Monday, January 20th, 2 AM CET at the latest (or refrain from it altogether).
I give 10:1 odds that a well written post on “dealing with a major life crisis” will attract a lot of interest. Whether it will be Mainable depends on how much others can learn from your story, and from your short summary it looks like there are some interesting lessons to be had. If you want to run a few things by a smaller audience first, the Freenode #lesswrong IRC channel seems like a place to start.
I just started a new business: software for sending money from the US to foreign countries. We compete with Western Union and Moneygram. We are starting with Kenya and sending money to the mobile money system in use there, M-Pesa.
The goals are: 1) to make money (which, once earned, will mostly be spent on effective altruist causes); and 2) make direct impact by a) reducing fees to send money to the developing world, and b) growing the remittance market resulting in wealth redistribution.
Why this project and not others? I believe that for people who have the capability to start companies, starting for-profit companies is one of the best things they can do. Occasionally startups hit it big, and when they do, they tend to make direct positive impact as well as money for the founders. I have selected this startup idea as a pretty good balance between direct impact and chances of success.
Recent progress: Today I talked to six Kenyan immigrants who worked in the US. Five of them currently send money using Western Union, and all five were quite enthusiastic about our product, enough to offer help and introductions. Two of them were most excited about M-Pesa as the way recipients could receive the money, and the other three were most excited about lower fees.
The website is currently at http://www.waveremit.com . I’m looking to talk to people who a) send money to other countries regularly; b) have experience with federal or state money transmitter laws; or c) have other potentially useful information that I haven’t thought of.
I am translating the upcoming Sequences ebook to Slovak language. At the moment of writing this, the first of the six parts is fully translated and converted to DocBook, which allows me to export it to PDF, HTML, EPUB and a few other formats. Also, a half of the second part is translated but not converted yet. (For anyone interested, the first part in PDF is available at this temporary link.)
If anyone else is interested in translating the ebook to their language and would like to use the DocBook technology, I would be happy to help: to explain the system, share my scripts and configuration files, provide some tips and best practices about the technology in general and its use for the Sequences; also how to use non-English characters. Actually, I could even coordinate multiple volunteers for your language, so you could just focus on translating one chapter at a time and ignore the big picture. (I have some experience with coordinating translators in an open-source project.) A short introduction: you write each chapter as an XML file (just like in HTML, only with different tags), and the details of converting the XML files to the resulting format are specified separately. You can do a lot of customization, some of it requires merely setting some variables, but you could do anything using XSLT. Installing the whole toolchain on your computer is very easy (at least in Windows); I could send you a ZIP file that you’d simply extract in a directory and everything should work out of the box. (The worst case, if everything fails: you will have to write a simple script to convert the XMLs to some other format you decide to use.)
Why? As a part of attempt to create a rationalist community in my country. I am already translating the web articles (I had the first part of the book mostly covered already), so the book provides me a limited scope for what I was doing anyway. I believe a book has some advantages over blog. One of them: people are used to reading books from the beginning to the end, so you can cross larger inferential distances. With blogs, many people would just read an article or two, and then complain about the text being too long; also the comments are distracting and reading them multiplies the size tenfold.
So the idea is that after translating, I can distribute the PDF files freely, but I can also print a copy or two of the book (each of the six parts would be one physical copy) and lend people the paper version. Assuming they are similar to me, if they read the first chapter and like it, they will be very likely to read it till the end.
And, is your DocBook files available somewhere? Github?
DocBook is a really cool set of tools for such cases, did you convert English version to DocBook? I just looked to html on this site and looks like it will take time for parse and convert it to DocBook.
There’s the overall “move to Portland and start at my awesome new job” Project, which is coming along well. A friend and I drove here from Ohio (pretty much across the entire country) in a little over two days. We drove because I have a very old dog that I didn’t want to fly. My POD arrives tomorrow, and I’m excited to have all my things back in my possession (I didn’t bring any nice shirts or sweaters with me, so I’ve been feeling slovenly-ish), but I am NOT excited for having to unpack and move furniture.
More specifically, I’m working on helping to launch the Beekeeper program at Beeminder. If you want to read more about that you can read this blog post we just released on the subject.
Personal-life-wise, I’ve just started on the “find my tribe here” Project, which I don’t like. I like HAVING a tribe (and very much miss the awesome group I left in Columbus), but actually having to go out and either find one or put one together sounds like less fun.
A bit of directed social networking won’t go astray. Collect contacts. Socialise with intent. Become a social supernode: the person who knows all the social nodes. It’s fun being able to collide people’s worlds.
Thanks for the input! I think I wasn’t clear though. I actually already am a networky type person (the tribe I am leaving behind is one that I did most of the work building) with high social skills.
However, I am ALSO extremely picky in who I’ll allow into my close social circle (I have well thought out reasons for this). That means to start a new tribe I’ll have to meet LOTS and LOTS of people just to find one that I consider to be worth my time. When I already have a group, then I don’t have to be overly proactive about this because there’s no particular rush, so I can just pull in people as I come across them. However when I’m starting with zero, I’m in a bit of a rush, and therefore have to plan going to various events with lots of people I don’t know and meet them all, and find most of them boring, or assholes, or unable to logic, or unable to social, or whatnot, and those are the people I have to go through over and over in order to find a single person who I’d actually want to be friends with. And as much as I HAVE social skills, I am also an introvert at core, and so find the idea to be draining (especially right now, when I’m already emotionally drained from moving!). This is not a massive problem, and I can easily overcome the draininess, but I still recognize that it is what I will struggle the most with in the next couple months. :)
Any advice for someone who might be moving from the east coast to the west in the next year and a half?
I got my Vim-edited textfile sourced Anki deck going with around 1000 cards now. I seeded the deck with stuff that isn’t very serious or difficult, but helps keep the gears turning for the habit to form:
Alan Perlis epigrams, I’m doing random quotations cloze-style, the front shows enough of the beginning of the quotation to establish context, and the back shows the rest. Seems to work well enough.
Default keywords for 0 to 99 in the mnemonic major system.
The periodic table of elements in atomic number / name and abbreviation format (I used the mnemonic major system to learn these with just the number to go on).
The 125 design principles in name / description format from Universal Principles of Design revised edition.
The 100 design principles in name / description format from The Art of Game Design: A Book of Lenses.
Select best practice items in cloze deletion format from The C++ Programming Language, 4th Edition.
The Morse code alphabet in both directions.
Select notable historical events in event / year format from the Wikipedia world history timelines.
The 100 famous paintings list from decks by LW users to test cards with images. (Some of these seem to be mislabeled, or I messed up my import.)
These should be providing me with fodder for daily reviews for some months, so there’s time to come up with a study routine for regularly generating more cards before the whole thing starts feeling pointless and I start skipping the reviews.
So far the text file input has worked quite well, though it does probably help to have a programmer’s mindset to text files, where the difference between a physical tab character and several space characters is an extremely big deal, for example. Being able to use text editor macros to batch edit a sequence of similar cards can help a lot. Haven’t done much latexy math input yet, so I don’t know if I’ll run into frustrations with that later.
Anki for Android has gotten better with the image files too. I don’t need to do the magic stuff from some years back anymore, just having the media folder next to my import textfile seems to work, and the latex renderings get passed more or less automatically to the Android version.
I’ve been mainly working on my graduate studies, but as a side project I’ve been working on developing a more efficient way of doing Solomonoff induction. The idea is that Solomonoff induction requires some language over which to construct programs. But most ‘formal’ languages we know of—such as Turing machines—do not fit well with observed reality, in that you wind up requiring a very large additive constant complexity to the complexity of your programs. So the goal is to find some computational substrate that will not require such large additive complexity. The most obvious solution to this would be probabilistic graphical networks (such as Bayesian networks), but searching through Bayesian networks is extremely hard because there are a huge number of possible network topologies. So my idea is to restrict the network topologies. It is known that ANNs are able to, in principle, simulate any function, but I suspect most ANNs are too crude. Instead, I’ve been working on combining ANN approaches with hierarchical bayesian networks. I suspect the resulting structure has enough power to produce most sequences found in real life while also being very easy to optimize over due to the recursive structure and the fact that good learning algorithms for ANNs are known. My initial experiments in this direction have been positive; the main goal now is to derive a more efficient learning algorithm and to prove bounds on time- and space-complexity.
Are artificial neural networks really Turing-complete? Yep, they are [Siegelman, Sontag 91]. Amount of neurons in the paper is 105, with rational edge weights, so it’s really Kolmogorov-complex. This, however, doesn’t say if we can build good machines for specific purposes.
Let’s figure out how to sort a dozen numbers with λ-calculus and sorting networks. It must stand to notice, that lambda-expression is O(1), whereas sorter network is O(n (log n)^2) in size.
Batcher’s odd–even mergesort would be O(log n) levels deep, and given one neuron is used to implement comparator, would result in O(n!) possible connections (around 229 per level). That we need 200 bits of insight to sort a dozen of numbers with that specific method does not mean that there is no cheaper way to do that, but sets a reasonable upper bound.
Apparently, I cannot do good lambda-calculus, but seems like we can do merge sorting of Church-encoded numerals in less than a hundred lambda-terms which is about the same amount of bits as sorting networks.
On a second note: how are Bayesian networks different from preceptrons, except fro having no thresholds?
I’ve been practicing stenographic typing daily for the last couple of weeks, but still am not at a point where I have a serious write-up or a better typing speed ready.
I’m working on this mostly to prove to myself that I can see long-term projects through. Sure, the promise of doubling my typing speed on English language input is promising, but it’s really too far off to be that tempting. So it’s mostly to show that I can keep working on a project if I keep my mind to it. That, and there are some people on this site who are rather interested in how it turns out—how much work it was, how difficult, how long it takes to get good enough at it, etc. I almost forgot the big one: if typing in steno makes a difference in the ease of writing.
When I get significantly faster than my QWERTY typing speed − 140 WPM over my original 90 - I’m doing a write-up on the experience.
Here’s links with the software and learning tools I’m using: http://plover.stenoknight.com/ http://qwertysteno.com/Home/
Keyboard: Microsoft Sidewinder X4. You need a keyboard that can correctly send combinations of key presses to the system—it’s called N-key rollover—and this is a $50 option that satisfies my needs.
EDIT: I realized that my write up was missing a lot of the actual strategies I used to practice. There are three skills I am working on. Steno theory, keyboard use, and brief memorization.
Steno theory is how you figure out the chords needed for an unfamiliar word. So for the word “needed”, I’d type that as “TPHAOED/-D”. TPH is the letter N, AOE is the long E vowel sound, and D is the final D. Then the -ed final sound is an unvoiced vowel and a D, so it’s a final D. I practice this by typing things using steno—I invariably find words that I don’t have memorized chords for, and then I figure it out or fail to and look it up.
The keyboard use is mostly a byproduct of use, with a little bit of http://stenoknight.com/stengrid.png for the multi-key consonants. I got started with the key drills either in qwertysteno or the typing drills at https://sites.google.com/site/ploverdoc/lesson-1-fingers-and-keys
For brief memorization, I’m using the top 100 words drill at http://qwertysteno.com/Practice/Words1.php , and limiting the range to a manageable few common words to spam out quickly (I started with ten). Once there’s little thinking involved, it’s time to add more words. This is really the nuts and bolts about how to type quickly with steno—the most common words are most of what you’re going to be using, so it’s where you get the most bang for your drilling buck. Well, if you get hung up on unfamiliar words you’re better off working on that—occasionally needing twenty seconds to get a word out is a lot of time even if it’s one word in sixty. But the words you want to get faster on are more common, so it’s a good tool to get the most common words practiced first and most.
How is your stenographic typing progressing? What has the return on effort been for you, so far?
It’s been stuck, but I haven’t barely been putting effort into it. I’ve been working much more on minimizing mouse usage—vim for text editing, firefox with pentadactyl for web browsing, and bash for many computing and programming tasks.
The low-hanging fruit is definitely not in getting better at stenographic typing—since I’ve started working as a professional software developer, there’s been much more computer-operation than English text entry. I’d have to figure out a really solid way of switching seamlessly between Vim’s normal-mode and stenographic typing in insert mode. And configuration and exploratory learning that I’m nowhere near capable of to adjust stenographic typing to writing code in addition to English. It’s likely still my best option for getting super solid at writing English text, but it’s simply lower priority at the moment than other tools.
Earlier last year, I set out to write a crossover fanfiction, that crossed something popular over with something completely original that nobody had ever heard of before, to see where my writing skills stood when it came to introducing an unfamiliar setting.
The fic is titled Forever After Earth, and as of today is over a hundred thousand words long and has gotten mostly positive reviews. I think I’ve learned a few things about how to get important details across without bogging down the narrative, from where reviewers were confused or outright wrong about something I thought I had clearly established.
It has also been a good exercise in recognizing Magic By Another Name, weeding out Separate Magisteria, and Asking the Right Question.
I’m working on a relativistic bullet hell game I’ve mentioned on here before. Now it can show sprites properly warped and colored.
I started a (non-rationalist) RWBY fanfic, Little Ruby Rose. It’s a series of one-shots about the stories the characters in RWBY reference, with them replacing the main character.
About a year ago I founded a startup that makes software—for doctors—that’s powered by Google Glass. We are now the largest Glass startup, with over 30 employees and a substantial amount of VC funding.
more here: www.augmedix.com
Started on some art projects recently… One is a portrait of my sister… another is some practice concept art for a game Idea centered around both regular and exotic explosions. The game may not ever come into existence,The portrait of my sister is more about practicing drawing people. Also, I don’t know if this counts as a project, but I’m learning how to use DirectX so theres that. Another Item I’m not sure counts as a project because it came in a kit was a computer interface board I made that can be used in home automation stuff. I had to solder all the components onto the board and everything! hehe had fun doing it too. there is a switch on the board though that is acting like it’s always on which suggests a short. I’ll need to study the schematic and refresh myself on what the components actually do to debug it though. Will probably get on that next week when I have time.
All of these projects tie into practicing my world building skills.
Based on the suggestion of some folks here, I’m learning the Lindsey Stirling song “Shadows”. I actually plan on recording it and reinterpreting the song in the style of hard rock/metal. So not only do I have to learn the violin part of the song, but also plan on rearranging or reinterpreting some of the other sounds/instruments and drum beats. This means that the overall feel is less dubstep/violin and more of a percussive rhythm guitar/metal vibe.
I’ll post it here when I’m done, which will probably be in a month or two.
I’m currently writing a fantasy novel. I plan on turning it into a web serial once I get enough of it written that I’m confident I won’t fall behind in posting. So far I have 30,000 words written and have 90% of the book plotted out in detail. I don’t have any specific goals for this project beyond wanting to finish it. I suppose when I put it up online, I will allow people to donate, but I don’t really expect it to get that many reviews/etc. It’s for my own pleasure, more or less. This is the prologue, if anyone is interested.
I am working on two workshops for my college’s THINK chapter. One is on procrastination and the other is on learning techniques. The one on learning techniques has been completely outlined. I’m working on adding exercises that groups can do (it is a workshop, not a lecture.) The other only has a loose outline of what I want to cover.
I’m determining where I want to go for graduate school. I want to help with research in immortality and life extension, so I’m looking up researchers and research done in those fields to try to see where I should go, and if I have the grades/experience to get in those universities.
I’m working on a new imperative programming language called Akasha. It’s a language where genetic programming can be done in a few lines. The language will allow for the user to use a default function set or they can specify what function set to include in their program generation. The programs generated will be Turing complete unless otherwise specified by the user by limiting the function set. Users will be allow to specify the amount of time a program generated will be allowed to run to prevent infinite loops. Also, the language will allow for a memory limit. It will also allow for program generation by other methods such as hill-climbing or simulated annealing.
The user will be allowed to specify parts of the program he wants generated. So for example, they might know the code they want is in a for loop, but don’t know the content of the for loop. They can write something like: for(int i = 0; i < 10000; i++){
}
This is cool. I’m a PL enthusiast myself and have occasionally been interested in code generation like this. What kind of stuff can it do?
I haven’t built it yet. I’m in the process of learning how to build languages at udacity: https://www.udacity.com/course/cs262 . But, the only thing I think it would have different than other languages is the code generation.