Vanessa Kosoy comments on Formal Solution to the Inner Alignment Problem

Vanessa Kosoy 9 Mar 2021 20:13 UTC
LW: 4 AF: 3
AF
When talking about uniform (worst-case) bounds, realizability just means the true environment is in the hypothesis class, but in a Bayesian setting (like in the OP) it means that our bounds scale with the probability of the true environment in the prior. Essentially, it means we can pretend the true environment was sampled from the prior. So, if (by design) training works by sampling environments from the prior, and (by realizability) deployment also consists of sampling an environment from the same prior, training and deployment are indistinguishable.
- evhub 9 Mar 2021 21:42 UTC
  LW: 2 AF: 2
  AF Parent
  Sure—by that definition of realizability, I agree that’s where the difficulty is. Though I would seriously question the practical applicability of such an assumption.