Here I use “option” in the sense of C++ std::optional<> / Rust Option / Haskell Maybe.
It feels to me like “real-world data” often ends up nested in thick layers of “optionality”, with the number of layers limited mostly by how precisely we want to represent the state of our “un-knowledge” about it. When we get data from some source, which potentially got the data in turn from another source, and so on, there is some kind of fuzziness or uncertainty added at each step, which we may or may not need to represent.
I’m thinking about this because I’m feeding data from Home Assistant into Prometheus/Grafana to log and graph it, and ran across some weirdness in how HomeAssistant handles exporting datapoints from currently-unavailable devices.
Layers of optionality that seem to exist here:
The sensor itself can tell us that the thing it’s sensing is in an unknown state. (For example, the Home Assistant Android app has a “sensor” tracking the current exercise activity of the person using the phone, which is often “unknown”.)
The sensor itself could in theory tell us that it has had some kind of internal malfunction and is not sensing anything (but is still connected to the network.) I don’t have examples of this here.
The system only samples the sensor at certain times; this may not be one of those times, so the current state may be inferred from the previous state. (This is sort of a weird one, because it involves time-dependence, which is a whole other kettle of fish.)
The system’s most recent attempt to sample the sensor could have failed, so that the latest value is unknown. (This is the case that gives the weirdness I ran into above—the Home Assistant UI, and apparently also the Promethus exporter, will repeat the last-known value for some time when this happens, which I think is ugly and undesirable.)
The Prometheus scraper could receive some kind of error state from Home Assistant. (In practice this would be merged with the following item.)
The Prometheus scraper could fail to reach the Home Assistant instance, and so not know anything about its state for the current instant.
(From this point on, I’m talking about things you wouldn’t normally think of in this framework at all, but I think it fits:) Prometheus could display an error in the UI, because it can’t reach its own database to get the values of the datapoint.
My browser could get an HTTP error from Prometheus indicating a failure to even produce a webpage.
My browser could give an error indicating that it couldn’t reach Prometheus at all.
I have obviously added every conceivable layer I could think of here, including some that don’t usually get thought about in a uniform way, and some that we would in practice never bother to distinguish. But I’m a data packrat, and also an amateur type theorist, and so I think a lot about data representations.
This whole thing shades into another space I think a lot about, which is error handling in programming languages and systems.
Some parts of the stack I described above really seem to fall under “error handling”—what do you do if you can’t reach component A from component B? Others seem to fall under “data representation”—If you poll someone who they’re voting for, and they say “I’m not voting”, or “I don’t know”, or “fuck you”, or “je ne parle pas Anglais”, what do you write down on the form (and which of those cases do you want to distinguish versus merge?) But the two are closely related.
Nested layers of “options”
Here I use “option” in the sense of C++
std::optional<>
/ RustOption
/ HaskellMaybe
.It feels to me like “real-world data” often ends up nested in thick layers of “optionality”, with the number of layers limited mostly by how precisely we want to represent the state of our “un-knowledge” about it. When we get data from some source, which potentially got the data in turn from another source, and so on, there is some kind of fuzziness or uncertainty added at each step, which we may or may not need to represent.
I’m thinking about this because I’m feeding data from Home Assistant into Prometheus/Grafana to log and graph it, and ran across some weirdness in how HomeAssistant handles exporting datapoints from currently-unavailable devices.
Layers of optionality that seem to exist here:
The sensor itself can tell us that the thing it’s sensing is in an unknown state. (For example, the Home Assistant Android app has a “sensor” tracking the current exercise activity of the person using the phone, which is often “unknown”.)
The sensor itself could in theory tell us that it has had some kind of internal malfunction and is not sensing anything (but is still connected to the network.) I don’t have examples of this here.
The system only samples the sensor at certain times; this may not be one of those times, so the current state may be inferred from the previous state. (This is sort of a weird one, because it involves time-dependence, which is a whole other kettle of fish.)
The system’s most recent attempt to sample the sensor could have failed, so that the latest value is unknown. (This is the case that gives the weirdness I ran into above—the Home Assistant UI, and apparently also the Promethus exporter, will repeat the last-known value for some time when this happens, which I think is ugly and undesirable.)
The Prometheus scraper could receive some kind of error state from Home Assistant. (In practice this would be merged with the following item.)
The Prometheus scraper could fail to reach the Home Assistant instance, and so not know anything about its state for the current instant.
(From this point on, I’m talking about things you wouldn’t normally think of in this framework at all, but I think it fits:) Prometheus could display an error in the UI, because it can’t reach its own database to get the values of the datapoint.
My browser could get an HTTP error from Prometheus indicating a failure to even produce a webpage.
My browser could give an error indicating that it couldn’t reach Prometheus at all.
I have obviously added every conceivable layer I could think of here, including some that don’t usually get thought about in a uniform way, and some that we would in practice never bother to distinguish. But I’m a data packrat, and also an amateur type theorist, and so I think a lot about data representations.
This whole thing shades into another space I think a lot about, which is error handling in programming languages and systems.
Some parts of the stack I described above really seem to fall under “error handling”—what do you do if you can’t reach component A from component B? Others seem to fall under “data representation”—If you poll someone who they’re voting for, and they say “I’m not voting”, or “I don’t know”, or “fuck you”, or “je ne parle pas Anglais”, what do you write down on the form (and which of those cases do you want to distinguish versus merge?) But the two are closely related.