I was not imagining you would, no. I was imagining something more along the lines of “come up with a million different hypotheses for what a 2-d grid could encode”, which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output.
I expect, though I cannot prove, that “a 2d projection of shapes in a 3d space” is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million
The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect.
4-6. Same as 1-3, but with a cylinder.
6-9. Same as 1-3, but with a sphere.
10-12. Same as 1-3, but with a torus.
13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing.
I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don’t know if that’s actually a reasonable assumption or whether it’s an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that “a space where it’s possible to detect other stuff in that space using signals” is a simple construct.
as well as an actually quite detailed specification of the “scene” you input to that ray tracer (which is probably way more bits than the original png).
If the size-optimal scene such that “ray-trace this scene and then apply corrections from this table” is larger than a naively compressed version of the png, my whole line of thought does indeed fall apart. I don’t expect that to be the case, because ray tracers are small and pngs are large, but this is a testable hypothesis (and not just “in principle it could be tested with a sufficiently powerful AI” but rather “an actual human could test it in a reasonable amount of time and effort using known techniques”). I don’t at this moment have time to test it, but maybe it’s worth testing at some point?
I wrote a reply to a separate comment you made in this thread here, but it’s relevant for this comment too. The idea that the data looks like “a 2-d grid” is an assumption true only for uncompressed bitmaps, but not for JPGs, PNGs, RAW, or any video codec. The statement that the limiting factor is “extreme patience” hints that this is really a question asking “what is the computational complexity[1] of an algorithm that can supposedly decode arbitrary data”?
I don’t think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think “fairly unprocessed sensor data from a very large number of nearly identical sensors” is one of those types of data.
I actually don’t know if my proposed method would work with jpgs—the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there’s lossy compression I expect that to make things much harder, and lossless compression I’d expect either makes it impossible (if you don’t figure out the encoding) or not meaningfully different than uncompressed (if you do).
RAW I’d expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.
Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it’s possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.
I was not imagining you would, no. I was imagining something more along the lines of “come up with a million different hypotheses for what a 2-d grid could encode”, which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output.
I expect, though I cannot prove, that “a 2d projection of shapes in a 3d space” is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million
The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance.
The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect. 4-6. Same as 1-3, but with a cylinder. 6-9. Same as 1-3, but with a sphere. 10-12. Same as 1-3, but with a torus. 13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing.
I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don’t know if that’s actually a reasonable assumption or whether it’s an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that “a space where it’s possible to detect other stuff in that space using signals” is a simple construct.
If the size-optimal scene such that “ray-trace this scene and then apply corrections from this table” is larger than a naively compressed version of the png, my whole line of thought does indeed fall apart. I don’t expect that to be the case, because ray tracers are small and pngs are large, but this is a testable hypothesis (and not just “in principle it could be tested with a sufficiently powerful AI” but rather “an actual human could test it in a reasonable amount of time and effort using known techniques”). I don’t at this moment have time to test it, but maybe it’s worth testing at some point?
I wrote a reply to a separate comment you made in this thread here, but it’s relevant for this comment too. The idea that the data looks like “a 2-d grid” is an assumption true only for uncompressed bitmaps, but not for JPGs, PNGs, RAW, or any video codec. The statement that the limiting factor is “extreme patience” hints that this is really a question asking “what is the computational complexity[1] of an algorithm that can supposedly decode arbitrary data”?
https://en.wikipedia.org/wiki/Computational_complexity
I don’t think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think “fairly unprocessed sensor data from a very large number of nearly identical sensors” is one of those types of data.
I actually don’t know if my proposed method would work with jpgs—the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there’s lossy compression I expect that to make things much harder, and lossless compression I’d expect either makes it impossible (if you don’t figure out the encoding) or not meaningfully different than uncompressed (if you do).
RAW I’d expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.
Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it’s possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.