On the literature that addresses your question: here is a classic LW post on this sort of question.
You point out that length of a description in English and length in code don’t necessarily correlate. I think for English sentences that are actually constraining expectations, there is a fairly good correlation between length in English and length in code.
There’s the issue that the high-level concepts we use in English can be short, but if we were writing a program from scratch using those concepts, the expansion of the concepts would be large. When I appeal to the concept of a buffer overflow when explaining how someone knows secrets from my email, the invocatory phrase “buffer overflow” is short, but the expansion out in terms of computers and transistors and semiconductors and solid state physics is rather long.
But I’m in the game of trying to explain all of my observations. I get to have a dictionary of concepts that I pay the cost for, and then reuse the words and phrases in the dictionary in all my explanations nice and cheaply. Similarly, the computer program that I use to explain the world can have definitions or a library of code, as long as I pay the cost for those definitions once.
So, I’m already paying the cost of the expansion of “buffer overflow” in my attempt to come up with simple explanations for the world. When new data has to be explained, I can happily consider explanations using concepts I’ve already paid for as rather simple.
On the literature that addresses your question: here is a classic LW post on this sort of question.
The linked post doesn’t seem to answer it, e.g. in the 4th paragraph EY says:
Why, exactly, is the length of an English sentence a poor measure of complexity? Because when you speak a sentence aloud, you are using labels for concepts that the listener shares—the receiver has already stored the complexity in them.
I also don’t think it fully addresses the question—or even partially in a useful way, e.g. EY says:
It’s enormously easier (as it turns out) to write a computer program that simulates Maxwell’s equations, compared to a computer program that simulates an intelligent emotional mind like Thor.
The formalism of Solomonoff induction measures the “complexity of a description” by the length of the shortest computer program which produces that description as an output.
But this bakes in knowledge about measuring stuff. Maxwell’s equations are—in part—easier to code because we have a way to describe measurements that’s easy to compute. That representation is via an abstraction layer! It uses labels for concepts too.
On the literature that addresses your question: here is a classic LW post on this sort of question.
You point out that length of a description in English and length in code don’t necessarily correlate. I think for English sentences that are actually constraining expectations, there is a fairly good correlation between length in English and length in code.
There’s the issue that the high-level concepts we use in English can be short, but if we were writing a program from scratch using those concepts, the expansion of the concepts would be large. When I appeal to the concept of a buffer overflow when explaining how someone knows secrets from my email, the invocatory phrase “buffer overflow” is short, but the expansion out in terms of computers and transistors and semiconductors and solid state physics is rather long.
But I’m in the game of trying to explain all of my observations. I get to have a dictionary of concepts that I pay the cost for, and then reuse the words and phrases in the dictionary in all my explanations nice and cheaply. Similarly, the computer program that I use to explain the world can have definitions or a library of code, as long as I pay the cost for those definitions once.
So, I’m already paying the cost of the expansion of “buffer overflow” in my attempt to come up with simple explanations for the world. When new data has to be explained, I can happily consider explanations using concepts I’ve already paid for as rather simple.
The linked post doesn’t seem to answer it, e.g. in the 4th paragraph EY says:
I also don’t think it fully addresses the question—or even partially in a useful way, e.g. EY says:
But this bakes in knowledge about measuring stuff. Maxwell’s equations are—in part—easier to code because we have a way to describe measurements that’s easy to compute. That representation is via an abstraction layer! It uses labels for concepts too.