Interpreting each 8-byte block as an IEEE double-precision float (endianity whatever’s default on x86-64, which IIRC is little-endian) yields all values in the range −4 .. 4.2. FFT of the resulting 2095 values gives (ignoring the nonzero DC term arising from the fact that the average isn’t quite zero) is distinctly not-flat, indicating that although the numbers look pretty random they are definitely not—they have more high-frequency and less low-frequency content than one would get from truly random numbers.
For what it’s worth, colored by how soon in the sequence they appear (blue is early, red is late) (Also note I interpreted it as 2094 points, with each number first used in the x-dimension and then in the y-dimension):
Note that one line near the top appears to be drawn twice, confirming if nothing else that it’s not a rule that it’s not a succession rule that only depends on the previous value, since the paths diverge afterwards. Still, comparing those two sections could be interesting.
The double line I was talking about is actually a triple line, at indices 366, 677, and 1244. The lines before come from fairly different places, and they diverge pretty quickly afterwards:
However, just above it, there’s another duplicate line, at indices 1038 and 1901: These start out closer together and also take a little bit longer to diverge.
This might be indicative of a larger pattern that points that are close together and have similar histories tend to have their next steps close to each other as well.
Every number in the sequence which equals 1-(y/2)^2 for “nice” y to “good” accuracy is in fact on the quadratic; i.e., when a number is “approximately” 1-(y/2)^2 for a y that “could” come next, that y does in fact come next. BUT to make this so I am having to define “nice” and “good” fairly tightly; this in fact occurs for only 14 values early in the sequence, perhaps before roundoff error starts being a nuisance.
(I am conjecturing that the way this thing is constructed might be that we start with a few initial values and then iterate some process like: “if (condition) then next number is—previous number; else if (condition) then next number is 2sqrt(1-previous); else if …”. If so, then we might start with “nice” numbers—in fact the first two are 1.3 and 2.1 to machine precision—but that “niceness” might be eroded as we calculate. This is all very handwavy and there is at least one kinda-obvious objection to it.)
A few reasons for suspecting that this sort of iterative process is in play:
It would fit with the relationship to “That Alien Message” where the idea is that each frame (here, each number) is the “next” state of affairs as something evolves according to simple rules.
There are many cases in which it sure looks as if one number is derived from its predecessor. (Or maybe from its successor.)
It’s an obvious thing to do, if you want to set a challenge like this.
Note that I am not necessarily conjecturing that each number is a function of its predecessor. There might be other internal state that we don’t get to see.
Do you see how such an iteration can produce the long-distance correlations I mention in a message below, between floats at positions that differ by a factor of 2/e? It seems that this would require some explicit dependence on the index.
Its not exactly 2/e. Here is a plot of the “error” of those points. The x axis is the larger point. The y axis is the smaller point minus 2/e times the larger.
So its within about 1% of 2/e, suggesting it might be a real thing, or might just be a coincidence.
(I am conjecturing that the way this thing is constructed might be that we start with a few initial values and then iterate some process like: “if (condition) then next number is—previous number; else if (condition) then next number is 2sqrt(1-previous); else if …”. If so, then we might start with “nice” numbers—in fact the first two are 1.3 and 2.1 to machine precision—but that “niceness” might be eroded as we calculate. This is all very handwavy and there is at least one kinda-obvious objection to it.)
Have you tried writing programs that create sequences like the above to see how close they get? (Asking to avoid redoing work if the answer is yes.)
Making explicit something I said in passing elsewhere: it seems not-impossible that the sequence might be best looked at in reverse order, since mapping x to 1-(x/2)^2 or (1-x^2)/2 is somewhat more natural than going the other way. However, it does look rather as if there might be some accumulation of roundoff error in the forward direction, which if true would argue for reading it forwards rather than backwards.
More structure emerges! Here’s a plot of consecutive pairs of values (data[i], data[i+1]) such that data[i+1] = -data[i+2]. ![Consecutive values before a negation](https://i.imgur.com/2FRBhAz.png)
Has anyone tried a five-dimensional representation instead of a two-dimensional one? 2095 isn’t divisible by 2 or by 3, but it is divisible by 5. Maybe our “aliens” have four spatial dimensions and one temporal.
I’ve tried highlighting every k-th point out of n with the same color for a bunch of n, but it all looks random. Right now, I’ve also tried using only 2 of 5 float values. It looks like a dead end, even though the idea is good.
I think the data is 1-dimensional, the interesting part is how each number is transformed into the next, and the 2d representation just happens to catch that (e.g., if x is transformed into −x, it lands on the diagonal).
Second try: When looking at scatterplots of any 3 out of 5 of those dimensions and interpreting each 5-tuple of numbers as one point, you can see the same structures that are visible in the 2d plot, the parabola and a line—though the line becomes a plane if viewed from a different angle, and the parabola disappears if viewed from a different angle
The straight line comes from the (x,-x) pairs I remarked on, which make up ~2/3 of the dataset. The parabolic thing comes from values …,a,b,… where a = 1-(b/2)^2 to something like full precision.
Here is a rather clear sign that it is IEEE754 64 bit floats indeed. (Up to correctly setting the endianness of 8-byte chunks,) if we remove the first n bits from each chunk and count how many distinct values that takes, we find a clear phase transition at n=12, which corresponds to removing the sign bit and the 11 exponent bits.
These first 12 bits take 22 different values, which (in binary) clearly cluster around 1024 and 3072, suggesting that the first bit is special. So without knowing about IEEE754 we could have in principle figured out the splitting into 1+11+52 bits. The few quadratic patterns we found have enough examples with each exponent to help understand the transitions between exponents and completely fix the format (including the implicit 1 in the significand?).
Alternative hypothesis: The first several bits (of each 64-bit chunk) are less chaotic than the middle bits due to repetitive border behavior of a 1-D cellular automaton. This hypothesis also accounts for the observation that the final seven bits of each chunk are always either 1000000 or 0111111.
If you were instead removing the last n bits from each chunk, you’d find another clear phase transition at n=7, as the last seven bits only have two observed configurations.
If you calculate the entropy p0log2(p0)+p1log(p1) of each of the 64 bit positions (where p0 and p1 are the proportion of bits 0 and 1 among 2095 at that position), then you’ll see that the entropy depends much more smoothly on position if we convert from little endian to big endian, namely if we sort the bits as 57,58,...,64, then 49,50,...,56, then 41,42,...,48 and so on until 1,...,8. That doesn’t sound like a very natural boundary behaviour of an automaton, unless it is then encoded as little endian for some reason.
Now that I know that, I’ve updated towards the “float64” area of hypothesis space. But in defense of the “cellular automaton” hypotheses, just look at the bitmap! Ordered initial conditions evolving into (spatially-clumped) chaos, with at least one lateral border exhibiting repetitive behavior:
The first few numbers are mostly “nice”. 1.3, 2.1, −0.5, 0.65, 1.05, 0.675, −1.15, 1.15, 1.70875, 0.375, −0.375, −0.3225, −2.3, 2.3. Next after that is 1.64078125, which is not so nice but still pretty nice.
These are 13⁄10, 21⁄10, −1/2, 13⁄20, 21⁄20, 27⁄40, −23/20, 23⁄20, 1367⁄800, 3⁄8, −3/8, −129/400, −23/10, 23⁄10, 10501⁄6400, …
A few obvious remarks about these: denominators are all powers of 2 times powers of 5; 13 and 21 are Fibonacci numbers; shortly after 13⁄10 and 21⁄10 we have 13⁄20 and 21⁄20; we have three cases of x followed by -x.
Slightly less than 1⁄3 of all numbers are followed by their negatives. Usually it goes a, -a, b, c, -c, d, etc., but (1) the first x,-x pair isn’t until the 7th/8th numbers, and (2) sometimes there is a longer gap and (3) there are a few places where there’s a shorter gap and we go x, -x, y, -y.
Most of the breaks in the (a, -a, b, c, -c, d) pattern look like either the +/- pair was skipped entirely or the unpaired value was skipped. My guess is the complete message consists of alternating “packets” of either a single value or a +/- pair, and each packet has some chance to be omitted (or they are deterministically omitted according to some pattern).
The number 23⁄20 appears three times. Each time it appears it is followed by a different number, and that different number is generally “unusually nasty compared with others so far”.
First time, at (0-based) index 7: followed by 1367⁄800, first with a denominator > 40.
Second time, at (0-based) index 21: followed by 3.1883919921875 ~= 1632.4567/2^9, first with a denominator > 6400.
Third time, at (0-based) index 48: followed by 2.4439140274654036, dunno what this “really” is, first with no obvious powers-of-2-and-5 denominator.
If we multiply it by 2^11/1001, the number we actually get is 5.00013579245669; that decimal tail also seems just slightly suspicious. 1, 3, 5, 7, 9, 2, 4, 6, approximately-8.
Early in the sequence (i.e., before roundoff error has much effect, if there’s something iterative going on) it seems like a surprising number of our numbers have denominators that are (power of 2) x 10000. As if there’s something happening to 4 decimal places, alongside something happening that involves exact powers of 2. (Cf. Measure’s conjecture that something is keeping track of numerators and denominators separately.) This is all super-handwavy and I don’t have a really concrete hypothesis to offer.
[EDITED to add:] The apparent accumulating roundoff is itself evidence against this, because it means that after a while our numbers are not powers of 2 over 10000. So I’ll be quite surprised if this turns out to be anything other than delusion. I’m leaving it here just in case it gives anyone useful ideas.
These aren’t all quite correctly rounded. E.g., the 12th number is about −0.3225 but it isn’t the nearest IEEE754 doublefloat to −0.3225. I suspect these are “honest” rounding errors (i.e., the values were produced by some sort of computation with roundoff error) rather than there being extra hidden information lurking in the low bits.
Hypothesis: the generator separately tracks the numerator and denominator and uses the xₙ₊₁ = 2*sqrt(1 - xₙ) rule exactly when this will result in both the numerator and denominator remaining integers.
Here is a plot of denominator ( of the closest fraction with denominator < 1000,000,000)
This looks exactly what you would expect if you started with a number that happened to be a fraction, and applied a process like squaring or adding that made the denominator bigger and bigger. This also indicates the sequence was computed from start to end, not the other way around.
The second relation never occurs when xₙ is the negation of the previous xₙ₋₁.
Furthermore, the second relation is always followed by xₙ₊₁ = -xₙ (i.e. there is never a “skipped pair” pattern break immediately following). This means that the skips are unlikely to be random.
It doesn’t look as if there are a lot of other such relationships to be found. There are a few probably-coincidental simple linear relationships between consecutive numbers very near the start. There are lots of y=−x, quite a lot of 4x=4−y2, one maybe-coincidental 14−x=2−y2, one maybe-coincidental x=−4(1−y)2, some 2x=1−y2, and I’m not seeing anything else among the first 400 pairs.
[EDITED to add:] But I have only looked at relationships where all the key numbers are powers of 2; maybe I should be considering things with 5s in as well. (Or maybe it’s 2s and 10s rather than 2s and 5s.)
The distribution of the numbers is distinctly not-uniform. Might be normal. Mean is about 0.157, standard deviation about 1.615. (I doubt these are really pi/20 and the golden ratio, but those would be consistent with the data :-).)
Again, clearly not independent given how the FFT looks.
Pretty sure it’s not normal. The middle isn’t bulgy enough. It’s hard to be very confident with relatively few numbers, but I eyeballed the histogram against several batches of normals with the same mean and variance, and it’s well outside what looks to me like the space of plausible histograms. I haven’t bothered attempting any actual proper statistical tests.
(Also, of course, if the numbers are generated by the sort of iterative process I imagine, it seems like it would be quite difficult to arrange for them to be much like normally distributed.)
Interpreting each 8-byte block as an IEEE double-precision float (endianity whatever’s default on x86-64, which IIRC is little-endian) yields all values in the range −4 .. 4.2. FFT of the resulting 2095 values gives (ignoring the nonzero DC term arising from the fact that the average isn’t quite zero) is distinctly not-flat, indicating that although the numbers look pretty random they are definitely not—they have more high-frequency and less low-frequency content than one would get from truly random numbers.
Here’s what you get when you interpret them as (y,x) positions of points.
For what it’s worth, colored by how soon in the sequence they appear (blue is early, red is late) (Also note I interpreted it as 2094 points, with each number first used in the x-dimension and then in the y-dimension):
Note that one line near the top appears to be drawn twice, confirming if nothing else that it’s not a rule that it’s not a succession rule that only depends on the previous value, since the paths diverge afterwards.
Still, comparing those two sections could be interesting.
The double line I was talking about is actually a triple line, at indices 366, 677, and 1244. The lines before come from fairly different places, and they diverge pretty quickly afterwards:
However, just above it, there’s another duplicate line, at indices 1038 and 1901:
These start out closer together and also take a little bit longer to diverge.
This might be indicative of a larger pattern that points that are close together and have similar histories tend to have their next steps close to each other as well.
Every number in the sequence which equals 1-(y/2)^2 for “nice” y to “good” accuracy is in fact on the quadratic; i.e., when a number is “approximately” 1-(y/2)^2 for a y that “could” come next, that y does in fact come next. BUT to make this so I am having to define “nice” and “good” fairly tightly; this in fact occurs for only 14 values early in the sequence, perhaps before roundoff error starts being a nuisance.
(I am conjecturing that the way this thing is constructed might be that we start with a few initial values and then iterate some process like: “if (condition) then next number is—previous number; else if (condition) then next number is 2sqrt(1-previous); else if …”. If so, then we might start with “nice” numbers—in fact the first two are 1.3 and 2.1 to machine precision—but that “niceness” might be eroded as we calculate. This is all very handwavy and there is at least one kinda-obvious objection to it.)
A few reasons for suspecting that this sort of iterative process is in play:
It would fit with the relationship to “That Alien Message” where the idea is that each frame (here, each number) is the “next” state of affairs as something evolves according to simple rules.
There are many cases in which it sure looks as if one number is derived from its predecessor. (Or maybe from its successor.)
It’s an obvious thing to do, if you want to set a challenge like this.
Note that I am not necessarily conjecturing that each number is a function of its predecessor. There might be other internal state that we don’t get to see.
Do you see how such an iteration can produce the long-distance correlations I mention in a message below, between floats at positions that differ by a factor of 2/e? It seems that this would require some explicit dependence on the index.
Its not exactly 2/e. Here is a plot of the “error” of those points. The x axis is the larger point. The y axis is the smaller point minus 2/e times the larger.
So its within about 1% of 2/e, suggesting it might be a real thing, or might just be a coincidence.
Have you tried writing programs that create sequences like the above to see how close they get? (Asking to avoid redoing work if the answer is yes.)
Making explicit something I said in passing elsewhere: it seems not-impossible that the sequence might be best looked at in reverse order, since mapping x to 1-(x/2)^2 or (1-x^2)/2 is somewhat more natural than going the other way. However, it does look rather as if there might be some accumulation of roundoff error in the forward direction, which if true would argue for reading it forwards rather than backwards.
More structure emerges! Here’s a plot of consecutive pairs of values (data[i], data[i+1]) such that data[i+1] = -data[i+2]. ![Consecutive values before a negation](https://i.imgur.com/2FRBhAz.png)
Has anyone tried a five-dimensional representation instead of a two-dimensional one? 2095 isn’t divisible by 2 or by 3, but it is divisible by 5. Maybe our “aliens” have four spatial dimensions and one temporal.
I’ve tried highlighting every k-th point out of n with the same color for a bunch of n, but it all looks random. Right now, I’ve also tried using only 2 of 5 float values. It looks like a dead end, even though the idea is good.
I think the data is 1-dimensional, the interesting part is how each number is transformed into the next, and the 2d representation just happens to catch that (e.g., if x is transformed into −x, it lands on the diagonal).
Second try: When looking at scatterplots of any 3 out of 5 of those dimensions and interpreting each 5-tuple of numbers as one point, you can see the same structures that are visible in the 2d plot, the parabola and a line—though the line becomes a plane if viewed from a different angle, and the parabola disappears if viewed from a different angle
.Looking at scatterplots of any 3 out of 5 of those dimensions, it looks pretty random, much less structure than in the 2d plot.
Edit: Oh, wait, I’ve been using chunks of 419 numbers as the dimensions but should be interleaving them
(This has y increasing downwards.)
The straight line comes from the (x,-x) pairs I remarked on, which make up ~2/3 of the dataset. The parabolic thing comes from values …,a,b,… where a = 1-(b/2)^2 to something like full precision.
Ah, it’s the Elden Ring
There are 1047 points. The points on the diagonal are distributed all across the list—here are the first indices
[3,6,9,10,13,16,18,21,25,27,28,31,34,38,41,43,46,49,52,55]...
They’re usually 3 apart, but sometimes 1, 2, 4, 5, or 6. The last one has index 1045.
By the way, it’s definitely not aliens. Aliens would not be sending messages encoded using IEEE754 double-precision floats.
Here is a rather clear sign that it is IEEE754 64 bit floats indeed. (Up to correctly setting the endianness of 8-byte chunks,) if we remove the first n bits from each chunk and count how many distinct values that takes, we find a clear phase transition at n=12, which corresponds to removing the sign bit and the 11 exponent bits.
These first 12 bits take 22 different values, which (in binary) clearly cluster around 1024 and 3072, suggesting that the first bit is special. So without knowing about IEEE754 we could have in principle figured out the splitting into 1+11+52 bits. The few quadratic patterns we found have enough examples with each exponent to help understand the transitions between exponents and completely fix the format (including the implicit 1 in the significand?).
Alternative hypothesis: The first several bits (of each 64-bit chunk) are less chaotic than the middle bits due to repetitive border behavior of a 1-D cellular automaton. This hypothesis also accounts for the observation that the final seven bits of each chunk are always either
1000000
or0111111
.If you were instead removing the last n bits from each chunk, you’d find another clear phase transition at n=7, as the last seven bits only have two observed configurations.
If you calculate the entropy p0log2(p0)+p1log(p1) of each of the 64 bit positions (where p0 and p1 are the proportion of bits 0 and 1 among 2095 at that position), then you’ll see that the entropy depends much more smoothly on position if we convert from little endian to big endian, namely if we sort the bits as 57,58,...,64, then 49,50,...,56, then 41,42,...,48 and so on until 1,...,8. That doesn’t sound like a very natural boundary behaviour of an automaton, unless it is then encoded as little endian for some reason.
Now that I know that, I’ve updated towards the “float64” area of hypothesis space. But in defense of the “cellular automaton” hypotheses, just look at the bitmap! Ordered initial conditions evolving into (spatially-clumped) chaos, with at least one lateral border exhibiting repetitive behavior:
The first few numbers are mostly “nice”. 1.3, 2.1, −0.5, 0.65, 1.05, 0.675, −1.15, 1.15, 1.70875, 0.375, −0.375, −0.3225, −2.3, 2.3. Next after that is 1.64078125, which is not so nice but still pretty nice.
These are 13⁄10, 21⁄10, −1/2, 13⁄20, 21⁄20, 27⁄40, −23/20, 23⁄20, 1367⁄800, 3⁄8, −3/8, −129/400, −23/10, 23⁄10, 10501⁄6400, …
A few obvious remarks about these: denominators are all powers of 2 times powers of 5; 13 and 21 are Fibonacci numbers; shortly after 13⁄10 and 21⁄10 we have 13⁄20 and 21⁄20; we have three cases of x followed by -x.
Slightly less than 1⁄3 of all numbers are followed by their negatives. Usually it goes a, -a, b, c, -c, d, etc., but (1) the first x,-x pair isn’t until the 7th/8th numbers, and (2) sometimes there is a longer gap and (3) there are a few places where there’s a shorter gap and we go x, -x, y, -y.
Most of the breaks in the (a, -a, b, c, -c, d) pattern look like either the +/- pair was skipped entirely or the unpaired value was skipped. My guess is the complete message consists of alternating “packets” of either a single value or a +/- pair, and each packet has some chance to be omitted (or they are deterministically omitted according to some pattern).
The number 23⁄20 appears three times. Each time it appears it is followed by a different number, and that different number is generally “unusually nasty compared with others so far”.
First time, at (0-based) index 7: followed by 1367⁄800, first with a denominator > 40.
Second time, at (0-based) index 21: followed by 3.1883919921875 ~= 1632.4567/2^9, first with a denominator > 6400.
Third time, at (0-based) index 48: followed by 2.4439140274654036, dunno what this “really” is, first with no obvious powers-of-2-and-5 denominator.
[EDITED to fix an error.]
2.4439140274654036 might be (3³x19×3671×10631)/(2¹⁹x5⁶) with some incorrect rounding (2.4439140274658203125).
Value[71] is exactly half of value[49]. (and this again follows a 23)
The ”.4567″ seems just slightly suspicious.
That third number is quite close to 5005/2^11.
If we multiply it by 2^11/1001, the number we actually get is 5.00013579245669; that decimal tail also seems just slightly suspicious. 1, 3, 5, 7, 9, 2, 4, 6, approximately-8.
This could all be mere pareidolia, obviously.
Early in the sequence (i.e., before roundoff error has much effect, if there’s something iterative going on) it seems like a surprising number of our numbers have denominators that are (power of 2) x 10000. As if there’s something happening to 4 decimal places, alongside something happening that involves exact powers of 2. (Cf. Measure’s conjecture that something is keeping track of numerators and denominators separately.) This is all super-handwavy and I don’t have a really concrete hypothesis to offer.
[EDITED to add:] The apparent accumulating roundoff is itself evidence against this, because it means that after a while our numbers are not powers of 2 over 10000. So I’ll be quite surprised if this turns out to be anything other than delusion. I’m leaving it here just in case it gives anyone useful ideas.
These aren’t all quite correctly rounded. E.g., the 12th number is about −0.3225 but it isn’t the nearest IEEE754 doublefloat to −0.3225. I suspect these are “honest” rounding errors (i.e., the values were produced by some sort of computation with roundoff error) rather than there being extra hidden information lurking in the low bits.
x followed by -x is common.
cases where
1−xn=(xn+1/2)2are also common.
Hypothesis: the generator separately tracks the numerator and denominator and uses the xₙ₊₁ = 2*sqrt(1 - xₙ) rule exactly when this will result in both the numerator and denominator remaining integers.
Here is a plot of denominator ( of the closest fraction with denominator < 1000,000,000)
This looks exactly what you would expect if you started with a number that happened to be a fraction, and applied a process like squaring or adding that made the denominator bigger and bigger. This also indicates the sequence was computed from start to end, not the other way around.
Well xn+1=−2√1−xn appears around as often.
If this were true, there must be something aiming towards simplicity. (A huge numerator + denominator are unlikely to be squares)
The second relation never occurs when xₙ is the negation of the previous xₙ₋₁.
Furthermore, the second relation is always followed by xₙ₊₁ = -xₙ (i.e. there is never a “skipped pair” pattern break immediately following). This means that the skips are unlikely to be random.
Whenever 1−xn=(xn+1/2)2, this quantity is at most 4.
I’m finding also around 50 instances of 1−2xn=(xn+1)2∈[1,4] (namely 1≤|xn+1|≤2), with again xn+2=−xn+1.
It doesn’t look as if there are a lot of other such relationships to be found. There are a few probably-coincidental simple linear relationships between consecutive numbers very near the start. There are lots of y=−x, quite a lot of 4x=4−y2, one maybe-coincidental 14−x=2−y2, one maybe-coincidental x=−4(1−y)2, some 2x=1−y2, and I’m not seeing anything else among the first 400 pairs.
[EDITED to add:] But I have only looked at relationships where all the key numbers are powers of 2; maybe I should be considering things with 5s in as well. (Or maybe it’s 2s and 10s rather than 2s and 5s.)
There is one place where −xn=4(1−xn+1)2. I wonder whether there are many other relationships of broadly similar shape.
The distribution of the numbers is distinctly not-uniform. Might be normal. Mean is about 0.157, standard deviation about 1.615. (I doubt these are really pi/20 and the golden ratio, but those would be consistent with the data :-).)
Again, clearly not independent given how the FFT looks.
Pretty sure it’s not normal. The middle isn’t bulgy enough. It’s hard to be very confident with relatively few numbers, but I eyeballed the histogram against several batches of normals with the same mean and variance, and it’s well outside what looks to me like the space of plausible histograms. I haven’t bothered attempting any actual proper statistical tests.
(Also, of course, if the numbers are generated by the sort of iterative process I imagine, it seems like it would be quite difficult to arrange for them to be much like normally distributed.)