jessicata comments on Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata 14 Nov 2015 15:20 UTC
LW: 2 AF: 1
AF
If $q$ is not too low, then you can do this by taking a bunch of samples and evaluating them by expected utility. Of course, it might be expensive to evaluate this many samples.

I think that you can also do this with an adversarial game, as in your post on mimicry. You can have one system that takes some action, and another system that bets at some odds that the action was produced by the AI rather than the base distribution. This seems to work without learning the cost function.
- paulfchristiano 4 Dec 2015 4:10 UTC
  0 points
  AF Parent
  I was imagining the case where $O (q^{- 1})$ is too slow, i.e. where we want the AI to actually perform a search.
  
  The second paragraph is what I had in mind. Note that in this case you are maximizing over learnable cost functions rather than all cost functions.