Let’s say we want to create a program to perform a particular task T. Imagine you are constructing a program A by just throwing random things on the wall. The output of A is a program. So we run A and get a program B. We now see how well B performs on T. That tells us how well a change to A sticks.
Now assume that T is quite complicated and that A is random, i.e. it has no structure to it. Would you expect in that circumstance that B has no structure to it? To me, it seems quite likely that B will have a lot of regularity to it. It will not be good code from the human perspective, but there will be a lot of structure I think, simply because that structure is in T and the environment. If there are two obvious actions in the environment A and B, and a simple solution to T is to perform a sequence of these two actions depending on the context, then it seems kind of likely that partial solutions to this factorization would stick better to A.
If you try random things you are not actually accepting all random changes. You only accept changes that match the structure of the problem you are solving. You might often find a matching structure that no human designer would choose because it is not the easiest understandable thing. But to me, it seems that the program that is produced by the spaghetti test will be closer to a program a human designer would come up with than with a program sampled randomly. If we only consider all the programs that solve T, we will still be closer to a human interpretable program, than the “average program” that solves T.
That is somewhat vague, but I have a semi-strong intuition that something like this holds. Even in some non-trivial cases. In the case where we just consider all possible programs that solve T it kind of holds trivially. Then we would obviously be very close to the human-designed program, as there are programs of arbitrary size without dead code (in a weak sense) that solve T.
Now I made the assumption that A has no structure to it. But I am not sure that this actually holds in general. Throwing random things on the wall, and seeing if they stick to generate a program of size n, doesn’t seem equivalent to just randomly sampling from all programs of size n.
That’s true, but Program B will still be worse than a human-written program, so we aim to avoid spaghetti towers.
Spaghetti towers work especially poorly in changing environments: if evolution were reasonable, it would force us to try to maximize the number of our genes in the next generation. But instead, she created several heuristics like hunger and desire for groin friction. So when people came up with civilization, we started eating fast food and having sex with condoms.
First, thank you for writing this. I would ask that you continue to think & refine and share back what you discover, prove, or disprove.
To me, it seems quite likely that B will have a lot of regularity to it. It will not be good code from the human perspective, but there will be a lot of structure I think, simply because that structure is in T and the environment.
I’m interested to see if we can (i) do more than claim this is likely and (ii) unpack reasons that might require that it be the case.
One argument for (ii) would go like this. Assume the generating process for A has a preference for shorter length programs. So we can think of a A as a tending to find shorter description lengths that match task T.
Claim: shorter (and correct) descriptions reflect some combination of environmental structure and compression.
by ‘environmental structure’ I mean the laws underlying the task.
by ‘compression’ I mean using information theory embodied in algorithms to make the program smaller
I think this claim is true, but let’s not answer that too quickly. I’d like to probe this question more deeply.
Are there more than two factors (environmental structure & compression)?
Is it possible that the description gets the structure wrong but makes up for it with great compression? I think so. One can imagine a clever trick by which a small program expands itself into something like a big ball of mud that solves the task well.
Any expansion process takes time and space. This makes me wonder if we should care not only about description length but also run time and space. If we pay attention to both, it might be possible to penalize programs that expand into a big ball of mud.
However, penalizing run time and space might be unwise, depending on what we care about. One could imagine a program that start with first principles and derives higher-level approximations that are good enough to model the domain. It might be worth paying the cost of setting up the approximations because they are used frequently. (In other words, the amortized cost of the expansion is low.)
Broadly, what mathematical tools can we use on this problem?
Let’s say we want to create a program to perform a particular task T. Imagine you are constructing a program A by just throwing random things on the wall. The output of A is a program. So we run A and get a program B. We now see how well B performs on T. That tells us how well a change to A sticks.
Now assume that T is quite complicated and that A is random, i.e. it has no structure to it. Would you expect in that circumstance that B has no structure to it? To me, it seems quite likely that B will have a lot of regularity to it. It will not be good code from the human perspective, but there will be a lot of structure I think, simply because that structure is in T and the environment. If there are two obvious actions in the environment A and B, and a simple solution to T is to perform a sequence of these two actions depending on the context, then it seems kind of likely that partial solutions to this factorization would stick better to A.
If you try random things you are not actually accepting all random changes. You only accept changes that match the structure of the problem you are solving. You might often find a matching structure that no human designer would choose because it is not the easiest understandable thing. But to me, it seems that the program that is produced by the spaghetti test will be closer to a program a human designer would come up with than with a program sampled randomly. If we only consider all the programs that solve T, we will still be closer to a human interpretable program, than the “average program” that solves T.
That is somewhat vague, but I have a semi-strong intuition that something like this holds. Even in some non-trivial cases. In the case where we just consider all possible programs that solve T it kind of holds trivially. Then we would obviously be very close to the human-designed program, as there are programs of arbitrary size without dead code (in a weak sense) that solve T.
Now I made the assumption that A has no structure to it. But I am not sure that this actually holds in general. Throwing random things on the wall, and seeing if they stick to generate a program of size n, doesn’t seem equivalent to just randomly sampling from all programs of size n.
That’s true, but Program B will still be worse than a human-written program, so we aim to avoid spaghetti towers.
Spaghetti towers work especially poorly in changing environments: if evolution were reasonable, it would force us to try to maximize the number of our genes in the next generation. But instead, she created several heuristics like hunger and desire for groin friction. So when people came up with civilization, we started eating fast food and having sex with condoms.
First, thank you for writing this. I would ask that you continue to think & refine and share back what you discover, prove, or disprove.
I’m interested to see if we can (i) do more than claim this is likely and (ii) unpack reasons that might require that it be the case.
One argument for (ii) would go like this. Assume the generating process for A has a preference for shorter length programs. So we can think of a A as a tending to find shorter description lengths that match task T.
Claim: shorter (and correct) descriptions reflect some combination of environmental structure and compression.
by ‘environmental structure’ I mean the laws underlying the task.
by ‘compression’ I mean using information theory embodied in algorithms to make the program smaller
I think this claim is true, but let’s not answer that too quickly. I’d like to probe this question more deeply.
Are there more than two factors (environmental structure & compression)?
Is it possible that the description gets the structure wrong but makes up for it with great compression? I think so. One can imagine a clever trick by which a small program expands itself into something like a big ball of mud that solves the task well.
Any expansion process takes time and space. This makes me wonder if we should care not only about description length but also run time and space. If we pay attention to both, it might be possible to penalize programs that expand into a big ball of mud.
However, penalizing run time and space might be unwise, depending on what we care about. One could imagine a program that start with first principles and derives higher-level approximations that are good enough to model the domain. It might be worth paying the cost of setting up the approximations because they are used frequently. (In other words, the amortized cost of the expansion is low.)
Broadly, what mathematical tools can we use on this problem?