[deleted] comments on Measuring Optimization Power

[deleted] 13 Jun 2011 6:27 UTC
3 points

Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...

Does this notion of intelligence contain any factor accounting for how many resources have to be burnt to get to a good outcome? Simulated annealing, random walk metropolis, and gradient descent will all find you the local and/or global minima of a particular function. But depending on the landscape of that function, each method will do so in a much more or less illuminating way, not to mention have extremely different computational burdens.

A stochastic optimization approach, like annealing, doesn’t require any knowledge about the problem domain at all beyond the utility function (objective functional). You can “set it and forget it” so to speak, which is important when you are aware that you don’t know enough to make use of specialized knowledge to narrow the search… or when you can bear to make millions of calls to a single function evaluation (the utility function itself tried on different candidate output states) but cannot bear to compute the derivatives or use other local geometric cues.

The ‘value’ of an optimization scheme, to me, does not only lie in the Shannon entropy of the subset of outcomes that it can hand to you. Maybe this is just a consequence of utility functions needing to be recursive, so that in trying to optimize my utility, I partially burn some resources on choosing the correct optimization procedures (this choosing being itself a meta-layer of the overall optimization process). But it seems to me that this definition of intelligence is too narrow to be useful, in many of the same ways that Shannon entropy is too narrow to be useful.

Shannon entropy is great if you start out with some information in your hands and you want to send it to that guy over there. But it’s a terrible terrible terrible way to quantify information if instead you’re trying to pick apart what Nature hands you and decide if there is anything about it worth extracting that you can call “information.”

There’s a growing body of research which I think you would enjoy called “actionable information theory.” It’s an idea going back to Gibson that proceeds along the lines that information is just whatever-is-at-hand-that-I-can-exploit-to-achieve-my-goal. It presupposes goals and treats problems of cognition as control theory problems where I learn ways to interpret the world (what things to call information) such that goals are achieved.

I think you touch on these things in other posts, but I can’t find anything concrete that would serve as a good connection to stochastic optimization. Most of what I see here tends to assume that the output of an optimization process is the maximizing argument of some distribution, as in decision theory. But there are many cases where you don’t want to make a decision that way, and turning to genetic algorithms or stochastic optimization are good alternatives with their own set of theorems about convergence.