Yeah my guess is that you almost certainly fail on step 4 - an example of a really compact ray tracer looks like it fits in 64 bytes. You will not do search over all 64 byte programs. Even if you could evaluate 1 of them per atom per nanosecond using every atom in the universe for 100 billion years, you’d only get 44.6 bytes of search.
Let’s go with something more modest and say you get to use every atom in the milky way for 100 years, and it takes about 1 million atom-seconds to check a single program. This gets you about 30 bytes of search.
Priors over programs will get you some of the way there, but usually the structure of those priors will also lead to much much longer encodings of a ray tracer. You would also need a much more general / higher quality ray tracer (and thus more bits!) as well as an actually quite detailed specification of the “scene” you input to that ray tracer (which is probably way more bits than the original png).
The reason humans invented ray tracers with so much less compute is that we got ray tracers from physics and way way way more bits of evidence, not the other way around.
I was not imagining you would, no. I was imagining something more along the lines of “come up with a million different hypotheses for what a 2-d grid could encode”, which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output.
I expect, though I cannot prove, that “a 2d projection of shapes in a 3d space” is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million
The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect.
4-6. Same as 1-3, but with a cylinder.
6-9. Same as 1-3, but with a sphere.
10-12. Same as 1-3, but with a torus.
13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing.
I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don’t know if that’s actually a reasonable assumption or whether it’s an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that “a space where it’s possible to detect other stuff in that space using signals” is a simple construct.
as well as an actually quite detailed specification of the “scene” you input to that ray tracer (which is probably way more bits than the original png).
If the size-optimal scene such that “ray-trace this scene and then apply corrections from this table” is larger than a naively compressed version of the png, my whole line of thought does indeed fall apart. I don’t expect that to be the case, because ray tracers are small and pngs are large, but this is a testable hypothesis (and not just “in principle it could be tested with a sufficiently powerful AI” but rather “an actual human could test it in a reasonable amount of time and effort using known techniques”). I don’t at this moment have time to test it, but maybe it’s worth testing at some point?
I wrote a reply to a separate comment you made in this thread here, but it’s relevant for this comment too. The idea that the data looks like “a 2-d grid” is an assumption true only for uncompressed bitmaps, but not for JPGs, PNGs, RAW, or any video codec. The statement that the limiting factor is “extreme patience” hints that this is really a question asking “what is the computational complexity[1] of an algorithm that can supposedly decode arbitrary data”?
I don’t think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think “fairly unprocessed sensor data from a very large number of nearly identical sensors” is one of those types of data.
I actually don’t know if my proposed method would work with jpgs—the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there’s lossy compression I expect that to make things much harder, and lossless compression I’d expect either makes it impossible (if you don’t figure out the encoding) or not meaningfully different than uncompressed (if you do).
RAW I’d expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.
Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it’s possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.
Priors over programs will get you some of the way there, but usually the structure of those priors will also lead to much much longer encodings of a ray tracer.
If you grant the image being reconstructed, then 2 dimensional space is already in the cards. It’s not remotely 64 bits to make the leap to 3d space projected to 2d space. The search doesn’t have to be “search all programs in some low-level encoding”, it can be weighted on things that are mathematically interesting / elegant (which is a somewhat a priori feature).
Yeah my guess is that you almost certainly fail on step 4 - an example of a really compact ray tracer looks like it fits in 64 bytes. You will not do search over all 64 byte programs. Even if you could evaluate 1 of them per atom per nanosecond using every atom in the universe for 100 billion years, you’d only get 44.6 bytes of search.
Let’s go with something more modest and say you get to use every atom in the milky way for 100 years, and it takes about 1 million atom-seconds to check a single program. This gets you about 30 bytes of search.
Priors over programs will get you some of the way there, but usually the structure of those priors will also lead to much much longer encodings of a ray tracer. You would also need a much more general / higher quality ray tracer (and thus more bits!) as well as an actually quite detailed specification of the “scene” you input to that ray tracer (which is probably way more bits than the original png).
The reason humans invented ray tracers with so much less compute is that we got ray tracers from physics and way way way more bits of evidence, not the other way around.
I was not imagining you would, no. I was imagining something more along the lines of “come up with a million different hypotheses for what a 2-d grid could encode”, which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output.
I expect, though I cannot prove, that “a 2d projection of shapes in a 3d space” is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million
The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect. 4-6. Same as 1-3, but with a cylinder. 6-9. Same as 1-3, but with a sphere. 10-12. Same as 1-3, but with a torus. 13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing.
I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don’t know if that’s actually a reasonable assumption or whether it’s an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that “a space where it’s possible to detect other stuff in that space using signals” is a simple construct.
If the size-optimal scene such that “ray-trace this scene and then apply corrections from this table” is larger than a naively compressed version of the png, my whole line of thought does indeed fall apart. I don’t expect that to be the case, because ray tracers are small and pngs are large, but this is a testable hypothesis (and not just “in principle it could be tested with a sufficiently powerful AI” but rather “an actual human could test it in a reasonable amount of time and effort using known techniques”). I don’t at this moment have time to test it, but maybe it’s worth testing at some point?
I wrote a reply to a separate comment you made in this thread here, but it’s relevant for this comment too. The idea that the data looks like “a 2-d grid” is an assumption true only for uncompressed bitmaps, but not for JPGs, PNGs, RAW, or any video codec. The statement that the limiting factor is “extreme patience” hints that this is really a question asking “what is the computational complexity[1] of an algorithm that can supposedly decode arbitrary data”?
https://en.wikipedia.org/wiki/Computational_complexity
I don’t think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think “fairly unprocessed sensor data from a very large number of nearly identical sensors” is one of those types of data.
I actually don’t know if my proposed method would work with jpgs—the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there’s lossy compression I expect that to make things much harder, and lossless compression I’d expect either makes it impossible (if you don’t figure out the encoding) or not meaningfully different than uncompressed (if you do).
RAW I’d expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.
Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it’s possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.
If you grant the image being reconstructed, then 2 dimensional space is already in the cards. It’s not remotely 64 bits to make the leap to 3d space projected to 2d space. The search doesn’t have to be “search all programs in some low-level encoding”, it can be weighted on things that are mathematically interesting / elegant (which is a somewhat a priori feature).