When I put it that way, another problem with going off-distribution is apparent: even if we do find a way to get better scores according to every plausible hypothesis by going off-distribution, we trust those scores less because they’re off-distribution.
I realize I’m playing fast and loose with realizability again, but it seems to me that a system which is capable of being “calibrated”, in the sense I defined calibration above, should be able to reason for itself that it is less knowledgable about off-distribution points and have some kind of prior belief that the score for any particular off-distribution point is equal to the mean score for the entire (off-distribution?) space, and it should need a fair amount of evidence to shift this prior. I’m not necessarily specifying how concretely to achieve this, just saying that it seems like a desideratum for a “calibrated” ML system in the sense that I’m using the term.
Maybe effects like this could be achieved partially through e.g. having different hypotheses be defined on different subsets of the input space, and always including a baseline hypothesis which is just equal to the mean of the entire space.
If you want a backup system that also attempts to flag & veto any action that looks off-distribution for the sake of redundancy, that’s fine by me too. I think some safety-critical software systems for e.g. space shuttles have been known to do this (do a computation in multiple different ways & aggregate them somehow to mitigate errors in any particular subsystem).
Quantilization follows fairly directly from that :)
My current understanding of quantilization is “choose randomly from the top X% of actions”. I don’t see how this helps very much with staying on-distribution… as you say, the off-distribution space is larger, so the majority of actions in the top X% of actions could still be off-distribution.
In any case, quantilization seems like it shouldn’t work due to the fragility of value thesis. If we were to order all of the possible configurations of Earth’s atoms from best to worst according to our values, the top 1% of those configurations is still mostly configurations which aren’t very valuable.
My current understanding of quantilization is “choose randomly from the top X% of actions”. I don’t see how this helps very much with staying on-distribution… as you say, the off-distribution space is larger, so the majority of actions in the top X% of actions could still be off-distribution.
The base distribution you take the top X% of is supposed to be related to the “on-distribution” distribution, such that sampling from the base distribution is very likely to keep things on-distribution, at least if the quantilizer’s own actions are the main potential source of distributional shift. This could be the case if the quantilizer is the only powerful AGI in existence, and the actions of a powerful AGI are the only thing which would push things into sufficiently “off-distribution” possibilities for there to be a concern. (I’m not saying these are entirely reasonable assumptions; I’m just saying that this is one way of thinking about quantilization.)
In any case, quantilization seems like it shouldn’t work due to the fragility of value thesis. If we were to order all of the possible configurations of Earth’s atoms from best to worst according to our values, the top 1% of those configurations is still mostly configurations which aren’t very valuable.
The base distribution quantilization samples from is about actions, or plans, or policies, or things like that—not about configurations of atoms.
So, you should imagine a robot sending random motor commands to its actuators, not highly intelligently steering the planet into a random configuration.
I realize I’m playing fast and loose with realizability again, but it seems to me that a system which is capable of being “calibrated”, in the sense I defined calibration above, should be able to reason for itself that it is less knowledgable about off-distribution points and have some kind of prior belief that the score for any particular off-distribution point is equal to the mean score for the entire (off-distribution?) space, and it should need a fair amount of evidence to shift this prior. I’m not necessarily specifying how concretely to achieve this, just saying that it seems like a desideratum for a “calibrated” ML system in the sense that I’m using the term.
Maybe effects like this could be achieved partially through e.g. having different hypotheses be defined on different subsets of the input space, and always including a baseline hypothesis which is just equal to the mean of the entire space.
If you want a backup system that also attempts to flag & veto any action that looks off-distribution for the sake of redundancy, that’s fine by me too. I think some safety-critical software systems for e.g. space shuttles have been known to do this (do a computation in multiple different ways & aggregate them somehow to mitigate errors in any particular subsystem).
My current understanding of quantilization is “choose randomly from the top X% of actions”. I don’t see how this helps very much with staying on-distribution… as you say, the off-distribution space is larger, so the majority of actions in the top X% of actions could still be off-distribution.
In any case, quantilization seems like it shouldn’t work due to the fragility of value thesis. If we were to order all of the possible configurations of Earth’s atoms from best to worst according to our values, the top 1% of those configurations is still mostly configurations which aren’t very valuable.
The base distribution you take the top X% of is supposed to be related to the “on-distribution” distribution, such that sampling from the base distribution is very likely to keep things on-distribution, at least if the quantilizer’s own actions are the main potential source of distributional shift. This could be the case if the quantilizer is the only powerful AGI in existence, and the actions of a powerful AGI are the only thing which would push things into sufficiently “off-distribution” possibilities for there to be a concern. (I’m not saying these are entirely reasonable assumptions; I’m just saying that this is one way of thinking about quantilization.)
The base distribution quantilization samples from is about actions, or plans, or policies, or things like that—not about configurations of atoms.
So, you should imagine a robot sending random motor commands to its actuators, not highly intelligently steering the planet into a random configuration.