Transform it. It’s sometimes possible to transform a hard problem in one domain to an easier problem in another domain. This is similar to, but different from, going meta. Going meta is exactly what it says on the box, involving an increase in abstraction, but no change in domain, so that the problem looks the same, but with more generic features. Transforming a problem doesn’t necessarily change its abstraction level, but does change the domain so that the problem looks completely different.
My favorite example comes from the field of lossless data compression. Such algorithms try to find patterns in input data, and represent them with as few bits of output data as possible. (Lossy data compression algorithms, like MP3 and JPEG, discard data that humans aren’t expected to notice. Discarding is the most effective compression method of all, but lossless algorithms can’t get away with that.)
One way to design a lossless data compression algorithm is to directly attack the problem, building up a list of patterns, which you can refer back to as you notice them occurring again. This is called dictionary coding, and you’re already familiar with it from the many implementations (like ZIP and gzip) of the Lempel-Ziv algorithm and its many derivatives. LZ is fast, but it’s old (dating from 1977-1978) and doesn’t compress very well.
There are other approaches. As of several years ago, the most powerful lossless data compression algorithm was Prediction by Partial Matching, developed in 1984 according to Wikipedia. PPM uses Fancy Math to predict future symbols (i.e. bytes) based on past symbols. It compresses data really well, but is rather slow (this is why it hasn’t taken over the world). The algorithm is different, but the approach is the same, directly attacking the problem of pattern matching.
In 1994, a new algorithm was developed. Michael Burrows and David Wheeler discovered something very interesting: if you take a string of symbols (i.e. bytes), put every rotation of it in a matrix, and sort the rows of that matrix lexicographically (all of this might sound hard, but it’s easy for a competent programmer to write, although doing so efficiently takes some work), then this matrix has a very special property. Every column is a permutation of the input string (trivial to see), the first column is the symbols of the input string in sorted order (also trivial to see), and the last column, although jumbled up, can be untangled into the original input string with only 4 bytes (or 8 bytes) of extra data. The fact that this transformation can be undone is what makes it useful (there are lots of ways to irreversibly jumble strings). (Going back to the overall point, transforming a problem isn’t useful until you can transform the solution back.)
After undergoing this (reversible) Burrows-Wheeler Transformation (BWT), your data looks totally different. It’s easier to see than to explain. Here’s some text that you might have seen before:
If you follow how the BWT is performed, what it does is bring symbols with similar “contexts” together. While sorting that big matrix, all of the rows beginning with “ings” will be sorted together, and most of these rows will end with “R” (each row is a rotation of the input text, so it “wraps around”). The BWT consumes text that is full of rich patterns (i.e. context) in the original domain (here, English) and emits text that has repetitive characters, but with no other structure.
It’s hard to identify patterns (i.e. compress) in input text! It’s easier to deal with repetitive characters. Solving the problem of compression is easier in the Burrows-Wheeler domain.
If you throw another transformation at the data (the Move-To-Front transformation, which is also reversible), then these repetitive but different characters (eeeoeeeoIIiiaaaiiuoo, etc.) become numbers that are mostly 0, some 1, few 2, and a scattering of higher numbers. (Everything is bytes, but I’m trying to keep it simple for non-programmers). It’s really really easy to compress data that looks like this. So easy, in fact, that one of the oldest algorithms ever developed is up to the job. (Not Euclid’s, though.) The Huffman algorithm, developed in 1952, efficiently compresses this kind of data. Arithmetic coding, which is newer, is optimal for this kind of data. Anyways, after applying the BWT, everything afterwards is much easier.
(I’ve glossed over Zero-Length Encoding, which is a transformation that can be applied after MTF but before Huffman-or-arithmetic that further increases compression quality. It’s yet another example of how you can transform yourself out of an inconvenient domain, though.)
(Note: I haven’t kept up with recent developments in lossless data compression. 7-Zip is powered by LZMA, which as the acronym suggests is Lempel-Ziv based but with special sauce (which resembles PPM as far as I can tell). However, as it was developed in 1998, I believe that the history above is a reasonably applicable example.)
(Edit: Replaced underscores with twiddles in the BWT example because apparently underscores are as good as stars in Less Wrong formatting. Second edit: Further unmangling. Less Wrong really needs pre-tags. Third edit: Trying four spaces. Random thought: My comments need changelogs.)
Effective transformations are uncommon, but when one is possible and you find it, you win. Transformation can’t really deal with inherently hard problems, but it can show you that a problem you thought was hard is actually easy if you look at it in an alien way.
I can think of another example, but unfortunately it’s even more technical. (Not all hard problems are as easy to state as the Four Color Theorem! But like the BWT, this one directly affects the real world.) C++ programmers have been dealing with a hard problem for decades—the language allows high abstraction to coexist with high performance, but it’s a little too fond of copying memory around. This takes time (i.e. decreases performance) for no useful benefit. Recently, scary wizards discovered that the problem of identifying which copies are necessary and which are not can be transformed into the problem of distinguishing lvalues from rvalues (insert animal noises here if you like—this is the technical part). Before this discovery, the language didn’t allow programmers to cleanly distinguish between meows and woofs (so knowing this wouldn’t have helped ordinary programmers). But compilers have always been able to distinguish between them. It’s like Compilers 101, and even C compilers know the difference. Exposing this bit of information to user-programmers in a certain way allows all of those nasty unnecessary copies to be avoided. Someone directly attacking the problem of unnecessary copies would never have invented this in two to the twentieth years.
As a bonus, this solved many hard problems at once—it turns out that once user-programmers can tell meows and woofs apart, they can also solve the “forwarding problem” with “perfect forwarding”, making Boost’s developers (and me) cry with joy.
To restate, finding an appropriate transformation is a trick that turns a hard problem that you can’t directly attack, into an easy problem that you already know how to solve, or can figure out how to solve without very much work. It doesn’t solve the problem by itself, but it feels like it does. I would expect most if not all examples to be in mathematics and the hard sciences where abstractions can be cleanly and rigorously transformed, but I would be pleasantly surprised by an example from biology, etc.
This C++0x feature is rvalue references. Wikipedia has an article section about it, although it contains inaccuracies (“The function returning a std::vector temporary need only return a std::vector&&.” is wrongity wrong.)
Why? (I’m genuinely curious. I haven’t yet figured out how being on Facebook could be a productive use of my free time, which is unfortunately very limited.)
I’m genuinely curious. I haven’t yet figured out how being on Facebook could be a productive use of my free time, which is unfortunately very limited.
The basic functionality is simply to keep track of the list of interesting people, ideally with contact info automagically updating. Following the status updates is an extra that supports the sense of being in touch without explicit effort on your part, which can be made efficient if you block enough of the more prolific/uninteresting updates, and/or restrict the connections to people you know reasonably well.
A perhaps similar example, sometimes I have solved geometry problems (on tests) by using analytical geometry. Transform the problem into algebra by letting point 1 be (x1,y1), point 2 be (x2,y2), etc, get equations for the lines between the points, calculate their points of intersection, and so on. Sometimes this gives the answer with just mechanical application of algebra, no real insight or pattern recognition needed.
many important arbitrarily complicated functions can be represented as sums of much simpler functions
Check out Integral transforms, there’s a huge class of transforms that solve a bunch of math problems handily, and it comes with that nice explanation of why they seem to work so often.
The trick consists on discovering a twist you can apply to your complicated object which makes it easily separable into pieces that are susceptible to the function you wanted to apply to your object, such that you have some way of also untwisting your function’d twisted object to get your answer.
Here’s an example of Tim Gowers trying to apply the trick to help solve the Erdos Discrepancy Problem, subject of Polymath5 (collaborative mathematics over the internet!). That’s less than 24 hours ago. Yeah, it’s that useful in mathematics. :D (Edit: looking at it again, I’d say it’s a combination of transform and meta, which is a common combo)
Transform it. It’s sometimes possible to transform a hard problem in one domain to an easier problem in another domain. This is similar to, but different from, going meta. Going meta is exactly what it says on the box, involving an increase in abstraction, but no change in domain, so that the problem looks the same, but with more generic features. Transforming a problem doesn’t necessarily change its abstraction level, but does change the domain so that the problem looks completely different.
My favorite example comes from the field of lossless data compression. Such algorithms try to find patterns in input data, and represent them with as few bits of output data as possible. (Lossy data compression algorithms, like MP3 and JPEG, discard data that humans aren’t expected to notice. Discarding is the most effective compression method of all, but lossless algorithms can’t get away with that.)
One way to design a lossless data compression algorithm is to directly attack the problem, building up a list of patterns, which you can refer back to as you notice them occurring again. This is called dictionary coding, and you’re already familiar with it from the many implementations (like ZIP and gzip) of the Lempel-Ziv algorithm and its many derivatives. LZ is fast, but it’s old (dating from 1977-1978) and doesn’t compress very well.
There are other approaches. As of several years ago, the most powerful lossless data compression algorithm was Prediction by Partial Matching, developed in 1984 according to Wikipedia. PPM uses Fancy Math to predict future symbols (i.e. bytes) based on past symbols. It compresses data really well, but is rather slow (this is why it hasn’t taken over the world). The algorithm is different, but the approach is the same, directly attacking the problem of pattern matching.
In 1994, a new algorithm was developed. Michael Burrows and David Wheeler discovered something very interesting: if you take a string of symbols (i.e. bytes), put every rotation of it in a matrix, and sort the rows of that matrix lexicographically (all of this might sound hard, but it’s easy for a competent programmer to write, although doing so efficiently takes some work), then this matrix has a very special property. Every column is a permutation of the input string (trivial to see), the first column is the symbols of the input string in sorted order (also trivial to see), and the last column, although jumbled up, can be untangled into the original input string with only 4 bytes (or 8 bytes) of extra data. The fact that this transformation can be undone is what makes it useful (there are lots of ways to irreversibly jumble strings). (Going back to the overall point, transforming a problem isn’t useful until you can transform the solution back.)
After undergoing this (reversible) Burrows-Wheeler Transformation (BWT), your data looks totally different. It’s easier to see than to explain. Here’s some text that you might have seen before:
And here it is after undergoing the BWT:
If you follow how the BWT is performed, what it does is bring symbols with similar “contexts” together. While sorting that big matrix, all of the rows beginning with “ings” will be sorted together, and most of these rows will end with “R” (each row is a rotation of the input text, so it “wraps around”). The BWT consumes text that is full of rich patterns (i.e. context) in the original domain (here, English) and emits text that has repetitive characters, but with no other structure.
It’s hard to identify patterns (i.e. compress) in input text! It’s easier to deal with repetitive characters. Solving the problem of compression is easier in the Burrows-Wheeler domain.
If you throw another transformation at the data (the Move-To-Front transformation, which is also reversible), then these repetitive but different characters (eeeoeeeoIIiiaaaiiuoo, etc.) become numbers that are mostly 0, some 1, few 2, and a scattering of higher numbers. (Everything is bytes, but I’m trying to keep it simple for non-programmers). It’s really really easy to compress data that looks like this. So easy, in fact, that one of the oldest algorithms ever developed is up to the job. (Not Euclid’s, though.) The Huffman algorithm, developed in 1952, efficiently compresses this kind of data. Arithmetic coding, which is newer, is optimal for this kind of data. Anyways, after applying the BWT, everything afterwards is much easier.
(I’ve glossed over Zero-Length Encoding, which is a transformation that can be applied after MTF but before Huffman-or-arithmetic that further increases compression quality. It’s yet another example of how you can transform yourself out of an inconvenient domain, though.)
(Note: I haven’t kept up with recent developments in lossless data compression. 7-Zip is powered by LZMA, which as the acronym suggests is Lempel-Ziv based but with special sauce (which resembles PPM as far as I can tell). However, as it was developed in 1998, I believe that the history above is a reasonably applicable example.)
(Edit: Replaced underscores with twiddles in the BWT example because apparently underscores are as good as stars in Less Wrong formatting. Second edit: Further unmangling. Less Wrong really needs pre-tags. Third edit: Trying four spaces. Random thought: My comments need changelogs.)
This is interesting, but I find it specific enough that I think I’d have trouble applying it to another domain.
Less Wrong comments use (a not completely bug-free implementation of) Markdown.
Four spaces worked! Thank you.
Effective transformations are uncommon, but when one is possible and you find it, you win. Transformation can’t really deal with inherently hard problems, but it can show you that a problem you thought was hard is actually easy if you look at it in an alien way.
I can think of another example, but unfortunately it’s even more technical. (Not all hard problems are as easy to state as the Four Color Theorem! But like the BWT, this one directly affects the real world.) C++ programmers have been dealing with a hard problem for decades—the language allows high abstraction to coexist with high performance, but it’s a little too fond of copying memory around. This takes time (i.e. decreases performance) for no useful benefit. Recently, scary wizards discovered that the problem of identifying which copies are necessary and which are not can be transformed into the problem of distinguishing lvalues from rvalues (insert animal noises here if you like—this is the technical part). Before this discovery, the language didn’t allow programmers to cleanly distinguish between meows and woofs (so knowing this wouldn’t have helped ordinary programmers). But compilers have always been able to distinguish between them. It’s like Compilers 101, and even C compilers know the difference. Exposing this bit of information to user-programmers in a certain way allows all of those nasty unnecessary copies to be avoided. Someone directly attacking the problem of unnecessary copies would never have invented this in two to the twentieth years.
As a bonus, this solved many hard problems at once—it turns out that once user-programmers can tell meows and woofs apart, they can also solve the “forwarding problem” with “perfect forwarding”, making Boost’s developers (and me) cry with joy.
To restate, finding an appropriate transformation is a trick that turns a hard problem that you can’t directly attack, into an easy problem that you already know how to solve, or can figure out how to solve without very much work. It doesn’t solve the problem by itself, but it feels like it does. I would expect most if not all examples to be in mathematics and the hard sciences where abstractions can be cleanly and rigorously transformed, but I would be pleasantly surprised by an example from biology, etc.
(Edit: Avoided unnecessary sentence.)
As a former C++ programmer who doesn’t keep track of the current events, I’d appreciate a specific link/keyword to the mechanism you were describing.
This C++0x feature is rvalue references. Wikipedia has an article section about it, although it contains inaccuracies (“The function returning a std::vector temporary need only return a std::vector&&.” is wrongity wrong.)
Pleased to make an acquaintance of your secret identity. :-) It’s a shame you are not on Facebook.
Why? (I’m genuinely curious. I haven’t yet figured out how being on Facebook could be a productive use of my free time, which is unfortunately very limited.)
The basic functionality is simply to keep track of the list of interesting people, ideally with contact info automagically updating. Following the status updates is an extra that supports the sense of being in touch without explicit effort on your part, which can be made efficient if you block enough of the more prolific/uninteresting updates, and/or restrict the connections to people you know reasonably well.
Would upvote again if I could. Please consider this as encouragement to post more.
Thanks; I’ll try to write a full post one of these days.
A perhaps similar example, sometimes I have solved geometry problems (on tests) by using analytical geometry. Transform the problem into algebra by letting point 1 be (x1,y1), point 2 be (x2,y2), etc, get equations for the lines between the points, calculate their points of intersection, and so on. Sometimes this gives the answer with just mechanical application of algebra, no real insight or pattern recognition needed.
When I was a kid this was how I solved all nontrivial geometry problems, because I was much better at algebra than geometry!
Check out Integral transforms, there’s a huge class of transforms that solve a bunch of math problems handily, and it comes with that nice explanation of why they seem to work so often.
The trick consists on discovering a twist you can apply to your complicated object which makes it easily separable into pieces that are susceptible to the function you wanted to apply to your object, such that you have some way of also untwisting your function’d twisted object to get your answer.
Here’s an example of Tim Gowers trying to apply the trick to help solve the Erdos Discrepancy Problem, subject of Polymath5 (collaborative mathematics over the internet!). That’s less than 24 hours ago. Yeah, it’s that useful in mathematics. :D (Edit: looking at it again, I’d say it’s a combination of transform and meta, which is a common combo)
Let’s try applying that to a recent hard problem: The Preference Utilitarian’s Time Inconsistency. (...time passes...) Didn’t see a way. How about a reference class that says “cryonics, singularity, superhuman AI etc. are highly probable”?
Here’s my attempt.
This might be interesting—but it seems much too long for a blog comment.
Much, no, but it is a long comment—too short for a post, though.