Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...
The reason you think one problem is simple is that you perceive a solution in closed form; you can imagine a short program, much shorter than 10 million bits, that solves it, and the work of inventing this program was done in your mind without apparent effort. So this problem is very trivial on the meta-level because the program that solves it optimally appears very quickly in the ordering of possible programs and is moreover prominent in that ordering relative to our instinctive transformations of the problem specification.
But if you were trying random solutions and the solution tester was a black box, then the alternating-bits problem would indeed be harder—so you can’t be measuring the raw difficulty of optimization if you say that one is easier than the other.
This is why I say that the human notion of “impressiveness” is best constructed out of a more primitive notion of “optimization”.
We also do, legitimately, find it more natural to talk about “optimized” performance on multiple problems than on a single problem—if we’re talking about just a single problem, then it may not compress the message much to say “This is the goal” rather than just “This is the output.”
Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...
Does this notion of intelligence contain any factor accounting for how many resources have to be burnt to get to a good outcome? Simulated annealing, random walk metropolis, and gradient descent will all find you the local and/or global minima of a particular function. But depending on the landscape of that function, each method will do so in a much more or less illuminating way, not to mention have extremely different computational burdens.
A stochastic optimization approach, like annealing, doesn’t require any knowledge about the problem domain at all beyond the utility function (objective functional). You can “set it and forget it” so to speak, which is important when you are aware that you don’t know enough to make use of specialized knowledge to narrow the search… or when you can bear to make millions of calls to a single function evaluation (the utility function itself tried on different candidate output states) but cannot bear to compute the derivatives or use other local geometric cues.
The ‘value’ of an optimization scheme, to me, does not only lie in the Shannon entropy of the subset of outcomes that it can hand to you. Maybe this is just a consequence of utility functions needing to be recursive, so that in trying to optimize my utility, I partially burn some resources on choosing the correct optimization procedures (this choosing being itself a meta-layer of the overall optimization process). But it seems to me that this definition of intelligence is too narrow to be useful, in many of the same ways that Shannon entropy is too narrow to be useful.
Shannon entropy is great if you start out with some information in your hands and you want to send it to that guy over there. But it’s a terrible terrible terrible way to quantify information if instead you’re trying to pick apart what Nature hands you and decide if there is anything about it worth extracting that you can call “information.”
There’s a growing body of research which I think you would enjoy called “actionable information theory.” It’s an idea going back to Gibson that proceeds along the lines that information is just whatever-is-at-hand-that-I-can-exploit-to-achieve-my-goal. It presupposes goals and treats problems of cognition as control theory problems where I learn ways to interpret the world (what things to call information) such that goals are achieved.
I think you touch on these things in other posts, but I can’t find anything concrete that would serve as a good connection to stochastic optimization. Most of what I see here tends to assume that the output of an optimization process is the maximizing argument of some distribution, as in decision theory. But there are many cases where you don’t want to make a decision that way, and turning to genetic algorithms or stochastic optimization are good alternatives with their own set of theorems about convergence.
Toby, if you were too dumb to see the closed-form solution to problem 1, it might take an intense effort to tweak the bit on each occasion, or perhaps you might have trouble turning the global criterion of total success or failure into a local bit-fixer; now imagine that you are also a mind that finds it very easy to sing MP3s...
The reason you think one problem is simple is that you perceive a solution in closed form; you can imagine a short program, much shorter than 10 million bits, that solves it, and the work of inventing this program was done in your mind without apparent effort. So this problem is very trivial on the meta-level because the program that solves it optimally appears very quickly in the ordering of possible programs and is moreover prominent in that ordering relative to our instinctive transformations of the problem specification.
But if you were trying random solutions and the solution tester was a black box, then the alternating-bits problem would indeed be harder—so you can’t be measuring the raw difficulty of optimization if you say that one is easier than the other.
This is why I say that the human notion of “impressiveness” is best constructed out of a more primitive notion of “optimization”.
We also do, legitimately, find it more natural to talk about “optimized” performance on multiple problems than on a single problem—if we’re talking about just a single problem, then it may not compress the message much to say “This is the goal” rather than just “This is the output.”
Does this notion of intelligence contain any factor accounting for how many resources have to be burnt to get to a good outcome? Simulated annealing, random walk metropolis, and gradient descent will all find you the local and/or global minima of a particular function. But depending on the landscape of that function, each method will do so in a much more or less illuminating way, not to mention have extremely different computational burdens.
A stochastic optimization approach, like annealing, doesn’t require any knowledge about the problem domain at all beyond the utility function (objective functional). You can “set it and forget it” so to speak, which is important when you are aware that you don’t know enough to make use of specialized knowledge to narrow the search… or when you can bear to make millions of calls to a single function evaluation (the utility function itself tried on different candidate output states) but cannot bear to compute the derivatives or use other local geometric cues.
The ‘value’ of an optimization scheme, to me, does not only lie in the Shannon entropy of the subset of outcomes that it can hand to you. Maybe this is just a consequence of utility functions needing to be recursive, so that in trying to optimize my utility, I partially burn some resources on choosing the correct optimization procedures (this choosing being itself a meta-layer of the overall optimization process). But it seems to me that this definition of intelligence is too narrow to be useful, in many of the same ways that Shannon entropy is too narrow to be useful.
Shannon entropy is great if you start out with some information in your hands and you want to send it to that guy over there. But it’s a terrible terrible terrible way to quantify information if instead you’re trying to pick apart what Nature hands you and decide if there is anything about it worth extracting that you can call “information.”
There’s a growing body of research which I think you would enjoy called “actionable information theory.” It’s an idea going back to Gibson that proceeds along the lines that information is just whatever-is-at-hand-that-I-can-exploit-to-achieve-my-goal. It presupposes goals and treats problems of cognition as control theory problems where I learn ways to interpret the world (what things to call information) such that goals are achieved.
I think you touch on these things in other posts, but I can’t find anything concrete that would serve as a good connection to stochastic optimization. Most of what I see here tends to assume that the output of an optimization process is the maximizing argument of some distribution, as in decision theory. But there are many cases where you don’t want to make a decision that way, and turning to genetic algorithms or stochastic optimization are good alternatives with their own set of theorems about convergence.