Fix some distribution D on {0,1}n, and some function R:{0,1}n×{0,1}m→[−1,1]. Then consider the set of circuits C:{0,1}n→{0,1}m for which the expectation of R(x,C(x)), for x sampled from D, is ≥0.
I think you need to uncheck “Markdown Comment Editor” under “Edit Account”. Your comment with latex follows:
Here is one definition of a “problem”:
Fix some distribution D on {0,1}n, and some function R : {0,1}n×{0,1}m→[−1,1]. Then consider the set of circuits C:{0,1}n→{0,1}m for which the expectation of R(x,C(x))), for x sampled from D, is ≥0.
Here is one definition of a “problem”:
Fix some distribution D on {0,1}n, and some function R:{0,1}n×{0,1}m→[−1,1]. Then consider the set of circuits C:{0,1}n→{0,1}m for which the expectation of R(x,C(x)), for x sampled from D, is ≥0.
Can we assume that R itself is aligned in the sense that it doesn’t assign non-negative values to outputs that are catastrophic to us?
Yeah, if we want C to not be evil we need some very hard-to-state assumption on R and D.
(markdown comment editor is unchecked, will take it up with admins)
Perhaps it’ll be useful to think about the question for specific D and R.
Here are the simplest D and R I can think of that might serve this purpose:
D - uniform over the integers in the range [1,101010].
R - for each input x, R assigns a reward of 1 to the smallest prime number that is larger than x, and −1 to everything else.
I think you need to uncheck “Markdown Comment Editor” under “Edit Account”. Your comment with latex follows: