So, I want to note a few things. The original Eliezer post was intended to argue against this line of reasoning:
I occasionally run into people who say something like, “There’s a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data.”
He didn’t worry about compute, because that’s not a barrier on the theoretical limit. And in his story, the entire human civilization had decades to work on this problem.
But you’re right, in a practical world, compute is important.
I feel like you’re trying to make this take as much compute as possible.
Since you talked about headers, I feel like I need to reiterate that, when we are talking to a neural network, we do not add the extra data. The goal is to communicate with the neural network, so we intentionally put it in easier to understand formats.
In the practical cases for this to come up (e.g. a nascent superintelligence figuring out physics faster than we expect), we probably will also be inputting data in an easy to understand format.
Similarly, I expect you don’t need to check every possible esoteric format. The likelihood of the image using an encoding like 61 bits per pixel, with 2 for red, 54 for green and 5 for blue is just, very low, a priori. I do admit I’m not sure if only using “reasonable” formats would cut down the possibilities into the computable realm (obviously depends on definitions of reasonable, though part of me feels like you could (with a lot of work) actually have an objective likeliness score of various encodings). But certainly it’s a lot harder to say that it isn’t than just saying “f(x) = (63 pick x), grows very fast.”
Though, since I don’t have a good sense for whether “reasonable” ones would be a more computable number, I should update in your direction. (I tried to look into something sort of analogous, and the most common 200 passwords cover a little over 4% of all used passwords, which, isn’t large enough for me to feel comfortable expecting that the most “likely” 1,000 formats would cover a significant quantity of the probability space, or anything.)
(Also potentially important. Modern neural nets don’t really receive things as a string of bits, but instead as a string of numbers, nicely split up into separate nodes. (yes, those numbers are made of bits, but they’re floating point numbers, and the way neural nets interact with them is through all the floating point operations, so I don’t think the neural net actually touches the bit representation of the number in any meaningful way.)
So, I want to note a few things. The original Eliezer post was intended to argue against this line of reasoning:
He didn’t worry about compute, because that’s not a barrier on the theoretical limit. And in his story, the entire human civilization had decades to work on this problem.
But you’re right, in a practical world, compute is important.
I feel like you’re trying to make this take as much compute as possible.
Since you talked about headers, I feel like I need to reiterate that, when we are talking to a neural network, we do not add the extra data. The goal is to communicate with the neural network, so we intentionally put it in easier to understand formats.
In the practical cases for this to come up (e.g. a nascent superintelligence figuring out physics faster than we expect), we probably will also be inputting data in an easy to understand format.
Similarly, I expect you don’t need to check every possible esoteric format. The likelihood of the image using an encoding like 61 bits per pixel, with 2 for red, 54 for green and 5 for blue is just, very low, a priori. I do admit I’m not sure if only using “reasonable” formats would cut down the possibilities into the computable realm (obviously depends on definitions of reasonable, though part of me feels like you could (with a lot of work) actually have an objective likeliness score of various encodings). But certainly it’s a lot harder to say that it isn’t than just saying “f(x) = (63 pick x), grows very fast.”
Though, since I don’t have a good sense for whether “reasonable” ones would be a more computable number, I should update in your direction. (I tried to look into something sort of analogous, and the most common 200 passwords cover a little over 4% of all used passwords, which, isn’t large enough for me to feel comfortable expecting that the most “likely” 1,000 formats would cover a significant quantity of the probability space, or anything.)
(Also potentially important. Modern neural nets don’t really receive things as a string of bits, but instead as a string of numbers, nicely split up into separate nodes. (yes, those numbers are made of bits, but they’re floating point numbers, and the way neural nets interact with them is through all the floating point operations, so I don’t think the neural net actually touches the bit representation of the number in any meaningful way.)