My impression of compressed sensing is that you have a problem where you don’t know it’s exactly 100 coefficients- you’re looking for the result that uses the minimum number of coefficients. Looking for the minimum takes more work than looking for one that’s exactly 100, and you’re right that this is binary optimization, which is a giant pain.
The beautiful compressed sensing result is that for a broad class of problems, the solution to the L1-norm minimum problem- which is much easier to solve- is also the solution to the L0-norm minimum problem. (For anyone not familiar, the L1 norm means “add the absolute value of the coefficients,” and the L0 norm means “add the number of non-zero coefficients,” which is not actually a norm.) In this particular context, I would look for the L1 minimum norm- and either it’s less than or equal to 100 coefficients, in which case I’m okay, or it’s more than 100 coefficients, in which case (so long as the conditions of the compressed sensing theorem apply) I know the problem as stated is infeasible, and I’ve found the least infeasible solution.
Compressed sensing in signal processing may help give an intuitive overview of what’s going on. Consider a stream of data samples S, collected periodically at a rate R samples per second. According to the sampling theorem, we can perfectly reconstruct a continuous signal from those samples up to a frequency of rate 1/(2R). So for samples taken every 1/100th of a second, we can perfectly reconstruct signals from 0 to 50 hz.
Now, take two different subsets of S and compare them:
if we take a subset of S at fixed intervals, we can only reconstruct part of the original signal perfectly. The part that we can reconstruct is at a lower frequency: for example, if we take every other sample from S, we can only reconstruct perfectly up to a limit of 25 hz. Frequencies above that cannot be uniquely determined.
if we take a completely random subset of S, we can reconstruct the entire frequency range, but we can not reconstruct it perfectly. This is similar to holography, where cutting the hologram in half still reproduces the whole image, but at lower resolution.
We use fixed subsampling when we know our signal is band limited, and that we can safely throw away some band of frequencies.
The use of random sampling (compressed sensing) is when we have a signal that is not band limited, but is sparse enough that we can still accurately describe using a smaller number of data points. The more frequency sparse the signal is, the more accurate our estimate will be. For most compressed sensing applications, the signal can be very sparse indeed, and the number of needed samples can be quite low.
Looking for the minimum takes more work than looking for one that’s exactly 100, and you’re right that this is binary optimization, which is a giant pain.
I think I was thinking of it as including m binary variables, which I’ll call z_j, which indicate whether or not the u_j are nonzero, and then an L0 minimization is that the cost function is the sum of the z_j or the L0 constraint is a constraint that their sum must equal 100. (Standard caveat than L0 is not actually a norm, which I should edit in to the previous post.)
The SAT problem Eliezer mentioned is binary (well, boolean), and it felt awkward to claim that it’s a SAT problem when that could be mistaken as a question on the otherSAT, so I decided to switch names without moving too far in concept-space. Your explanation seems much neater than mine, though.
[edit]I should be clear that I mean that looking for the L0 minimization directly is at least as hard as doing it the binary optimization way, but that the L1 minimization problem, which indirectly solves the L0 minimization problem only sometimes (read: almost always), is polynomial time, i.e. much easier to solve, because it’s linear.
I should be clear that I mean that looking for the L0 minimization directly is at least as hard as doing it the binary optimization way, but that the L1 minimization problem, which indirectly solves the L0 minimization problem only sometimes (read: almost always) is polynomial time, i.e. much easier to solve, because it’s linear.
My impression of compressed sensing is that you have a problem where you don’t know it’s exactly 100 coefficients- you’re looking for the result that uses the minimum number of coefficients. Looking for the minimum takes more work than looking for one that’s exactly 100, and you’re right that this is binary optimization, which is a giant pain.
The beautiful compressed sensing result is that for a broad class of problems, the solution to the L1-norm minimum problem- which is much easier to solve- is also the solution to the L0-norm minimum problem. (For anyone not familiar, the L1 norm means “add the absolute value of the coefficients,” and the L0 norm means “add the number of non-zero coefficients,” which is not actually a norm.) In this particular context, I would look for the L1 minimum norm- and either it’s less than or equal to 100 coefficients, in which case I’m okay, or it’s more than 100 coefficients, in which case (so long as the conditions of the compressed sensing theorem apply) I know the problem as stated is infeasible, and I’ve found the least infeasible solution.
Compressed sensing in signal processing may help give an intuitive overview of what’s going on. Consider a stream of data samples S, collected periodically at a rate R samples per second. According to the sampling theorem, we can perfectly reconstruct a continuous signal from those samples up to a frequency of rate 1/(2R). So for samples taken every 1/100th of a second, we can perfectly reconstruct signals from 0 to 50 hz.
Now, take two different subsets of S and compare them:
if we take a subset of S at fixed intervals, we can only reconstruct part of the original signal perfectly. The part that we can reconstruct is at a lower frequency: for example, if we take every other sample from S, we can only reconstruct perfectly up to a limit of 25 hz. Frequencies above that cannot be uniquely determined.
if we take a completely random subset of S, we can reconstruct the entire frequency range, but we can not reconstruct it perfectly. This is similar to holography, where cutting the hologram in half still reproduces the whole image, but at lower resolution.
We use fixed subsampling when we know our signal is band limited, and that we can safely throw away some band of frequencies.
The use of random sampling (compressed sensing) is when we have a signal that is not band limited, but is sparse enough that we can still accurately describe using a smaller number of data points. The more frequency sparse the signal is, the more accurate our estimate will be. For most compressed sensing applications, the signal can be very sparse indeed, and the number of needed samples can be quite low.
Why binary optimization?
I think I was thinking of it as including m binary variables, which I’ll call z_j, which indicate whether or not the u_j are nonzero, and then an L0 minimization is that the cost function is the sum of the z_j or the L0 constraint is a constraint that their sum must equal 100. (Standard caveat than L0 is not actually a norm, which I should edit in to the previous post.)
The SAT problem Eliezer mentioned is binary (well, boolean), and it felt awkward to claim that it’s a SAT problem when that could be mistaken as a question on the other SAT, so I decided to switch names without moving too far in concept-space. Your explanation seems much neater than mine, though.
[edit]I should be clear that I mean that looking for the L0 minimization directly is at least as hard as doing it the binary optimization way, but that the L1 minimization problem, which indirectly solves the L0 minimization problem only sometimes (read: almost always), is polynomial time, i.e. much easier to solve, because it’s linear.
Ok.