I just picked out enough lines to make some obvious observations. There are multiple sequences of ffffff, which I assume to be some kind of record separator. The beginning of the file is highly structured; the latter part of the file much less so. The obvious conclusion is that the first part of the file is some kind of metadata or header, while the main body begins further down.
I believe all this bit-level structure is a consequence of the values being IEEE754 double-precision values, many of them for fairly “simple” numbers, often with simple arithmetical relationships between consecutive numbers.
The nature of binary representations of floating-point is that nice bit-patterns make for round numbers and vice-versa, so I’m not sure that we can conclude a lot from that. The fact that the floating-point interpretation of the data results in numbers that cluster around certain values is telling, but could still be a red-herring. Part of my reluctance to endorse that theory is narrative: we were told that this is a simulated alien message, and what are the odds that aliens have independently invented double-precision floating point?
In any case, I’m reading those threads attentively, but in the meantime I’m going to pursue some hunches of my own.
Actually, the opener is quite a bit more structured than that, even: it’s three 4-byte sequences where the bytes are all identical or differ in only one bit, followed by a different 4-byte sequence. There is probably something really obvious going on here, but I need to stare at it a bit before it jumps out at me.
ETA: Switching to binary since there’s no reason to assume that the hexadecimal representation is particularly useful here.
The obvious pattern now is that every 8 bytes there is a repeated sequence of 6 bits which are all the same. Despite my initial protestations that the latter part of the file is less regular, this pattern holds throughout the entire file. The majority of the time the pattern is 111111, but there are a decent number of ones which are 000000 as well.
Hex dump of the first chunk of the file:
I just picked out enough lines to make some obvious observations. There are multiple sequences of
ffffff
, which I assume to be some kind of record separator. The beginning of the file is highly structured; the latter part of the file much less so. The obvious conclusion is that the first part of the file is some kind of metadata or header, while the main body begins further down.I believe all this bit-level structure is a consequence of the values being IEEE754 double-precision values, many of them for fairly “simple” numbers, often with simple arithmetical relationships between consecutive numbers.
The nature of binary representations of floating-point is that nice bit-patterns make for round numbers and vice-versa, so I’m not sure that we can conclude a lot from that. The fact that the floating-point interpretation of the data results in numbers that cluster around certain values is telling, but could still be a red-herring. Part of my reluctance to endorse that theory is narrative: we were told that this is a simulated alien message, and what are the odds that aliens have independently invented double-precision floating point?
In any case, I’m reading those threads attentively, but in the meantime I’m going to pursue some hunches of my own.
Actually, the opener is quite a bit more structured than that, even: it’s three 4-byte sequences where the bytes are all identical or differ in only one bit, followed by a different 4-byte sequence. There is probably something really obvious going on here, but I need to stare at it a bit before it jumps out at me.
ETA: Switching to binary since there’s no reason to assume that the hexadecimal representation is particularly useful here.
Okay, here’s something interesting. Showing binary representation, in blocks of 8 bytes:
The obvious pattern now is that every 8 bytes there is a repeated sequence of 6 bits which are all the same. Despite my initial protestations that the latter part of the file is less regular, this pattern holds throughout the entire file. The majority of the time the pattern is
111111
, but there are a decent number of ones which are000000
as well.