Ronny Fernandez comments on Off the Cuff Brangus Stuff

Ronny Fernandez Aug 2, 2019, 11:40 AM
9 points
Here is an idea I just thought of in an uber ride for how to narrow down the space of languages it would be reasonable to use for universal induction. To express the k-complexity of an object $O$ relative to a programing language $L$ I will write:
$K_{L} (O)$
Suppose we have two programing languages. The first is Python. The second is Qython, which is a lot like Python, except that it interprets the string “A” as a program that outputs some particular algorithmically large random looking character string $S$ with $K_{P y t h o n} (S) \approx 10^{15}$ . I claim that intuitively, Python is a better language to use for measuring the complexity of a hypothesis than Qython. That’s the notion that I just thought of a way to formally express.
There is a well known theorem that if you are using $L_{1}$ to measure the complexity of objects, and I am using $L_{2}$ to measure the complexity of objects, then there is a constant $c_{2}$ such that for any object $O$ :

$K_{L_{1}} (O) \leq K_{L_{2}} (O) + c_{2}$
In words, this means that you might think that some objects are less complicated than I do, and you might think that some objects are more complicated than I do, but you won’t think that any object is $c_{2}$ complexity units more complicated than I do. Intuitively, $c_{2}$ is just the length of the shortest program in $L_{1}$ that is a compiler for $L_{2} .$ So worst case scenario, the shortest program in $L_{1}$ that outputs $O$ will be a compiler for $L_{2}$ written in $L_{1}$ (which is $c_{2}$ characters long) plus giving that compiler the program in $L_{2}$ that outputs $O$ (which would be $K_{L_{2}} (O)$ characters long).
I am going to define the k-complexity of a function $f : X \to Y$ relative to a programing language as the length of the shortest program in that language such that when it is given $x$ as an input, it returns $f (x)$ . This is probably already defined that way, but jic. So say we have a function from programs in $L_{2}$ to their outputs and we call that function $C_{2}$ , then:
$K_{L_{1}} (C_{2}) = c_{2}$
There is also another constant:
$K_{L_{2}} (C_{1}) = c_{1}$
The first is the length of the shortest compiler for $L_{2}$ written in $L_{1}$ , and the second is the length of the shortest compiler for $L_{1}$ written in $L_{2}$ . Notice that these do not need to be equal. For instance, I claim that the compiler for Qython written in Python is roughly $10^{15}$ characters long, since we have to write the program that outputs $S$ in Python which by hypothesis was about $10^{15}$ characters long, and then a bit more to get it to run that program when it reads “A”, and to get that functionality to play nicely with the rest of Qython however that works out. By contrast, to write a compiler for Python in Qython it shouldn’t take very long. Since Qython basically is Python, it might not take any characters, but if there are weird rules in Qython for how the string “A” is interpreted when it appears in an otherwise Python-like program, then it still shouldn’t take any more characters than it takes to write a Python interpreter in regular Python.
So this is my proposed method for determining which of two programming languages it would be better to use for universal induction. Say again that we are choosing between $L_{1}$ and $L_{2}$ . We find the pair of constants such that $K_{L_{1}} (C_{2}) = c_{2}$ and $K_{L_{2}} (C_{1}) = c_{1}$ , and then compare their sizes. If $c_{1}$ is less than $c_{2}$ this means that it is easier to write a compiler for $L_{1}$ in $L_{2}$ than vice versa, and so there is more hidden complexity in $L_{2}$ ‘s encodings than in $L_{1}$ ’s, and so we should use $L_{1}$ instead of $L_{2}$ for assessing the complexity of hypotheses.
Lets say that if $K_{L_{2}} (C_{1}) < K_{L_{1}} (C_{2})$ then $L_{2}$ hides more complexity than $L_{1}$ .
A few complications:
It is probably not always decidable whether the smallest compiler for $L_{1}$ written in $L_{2}$ is smaller than the smallest compiler for $L_{2}$ written in $L_{1}$ , but this at least in principle gives us some way to specify what we mean by one language hiding more complexity than another, and it seems like at least in the case of Python vs. Qython, we can make a pretty good argument that the smallest compiler for Python written in Qython is smaller than the smallest compiler for Qython written in Python.
It is possible (I’d say probable) that if we started with some group of candidate languages and looked for languages that hide less complexity, we might run into a circle. Like the smallest compiler for $L_{1}$ in $L_{2}$ might be the same size as the smallest compiler for $L_{2}$ in $L_{1}$ but there might still be an infinite set of objects $O_{i}$ such that:

$\forall_{i} K_{L_{1}} (O_{i}) \neq K_{L_{2}} (O_{i})$
In this case, the two languages would disagree about the complexity of an infinite set of objects, but at least they would disagree about it by no more than the same fixed constant in both directions. Idk, seems like probably we could do something clever there, like take the average or something, idk. If we introduce an $L_{3}$ and the smallest compiler for $L_{3}$ in $L_{1}$ is larger than it is in $L_{2}$ , then it seems like we should pick $L_{1} .$
If there is an infinite set of languages that all stand in this relationship to each other, ie, all of the languages in an infinite set disagree about the complexity of an infinite set of objects and hide less complexity than any language not in the set, then idk, seems pretty damning for this approach, but at least we narrowed down the search space a bit?
Even if it turns out that we end up in a situation where we have an infinite set of languages that disagree about an infinite set of objects by exactly the same constant, it might be nice to have some upper bound on what that constant is.
In any case, this seems like something somebody would have thought of, and then proved the relevant theorems addressing all of the complications I raised. Ever seen something like this before? I think a friend might have suggested a paper that tried some similar method, and concluded that it wasn’t a feasible strategy, but I don’t remember exactly, and it might have been a totally different thing.
Watcha think?