cfoster0 comments on Don’t design agents which exploit adversarial inputs

cfoster0 21 Nov 2022 21:52 UTC
3 points
0
The two of us went back and forth in DMs on this for a bit. Based on that conversation, I think a mutually-agreeable translation of the above argument would be “sampling from [the conditional distribution of X-es given the Y label] is the same as sampling from [the distribution that has maximum joint [[closeness to the distribution of X-es] and [prevalence of Y-labeled X-es]]]”. Even if this isn’t exact, I buy that as at least morally true.
However, I don’t think this establishes the claim I’d been struggling with, which was that there’s some near equivalence between drawing a conditional sample and argmax searching over samples (possibly w/ some epsilon tolerance). The above argument establishes that we can view conditioning itself as the solution to a maximization problem over distributions, but not that we can view conditional sampling as the solution to any kind of maximization problem over samples.
- tailcalled 21 Nov 2022 22:39 UTC
  3 points
  0
  Parent
  I would also add that the key exciting things happen when you condition on an event with extremely low probability / have a utility function with an extremely wide range of available utilities. cfoster0′s view is that this will mostly just cause it to fail/output nonsense, because of standard arguments along the lines of the Optimizer’s Curse. I agree that this could happen, but I think it depends on the intelligence of the argmaxer/conditioner, and that another possibility (if we had more capable AI) is that this sort of optimization/conditioning could have a lot of robust effects on reality.