I sort of object to titling this post “Value Learning is only Asymptotically Safe” when the actual point you make is that we don’t yet have concrete optimality results for value learning other than asymptotic safety.
Doesn’t the cosmic ray example point to a strictly positive probability of dangerous behavior?
EDIT: Nvm I see what you’re saying. If I’m understanding correctly, you’d prefer, e.g. “Value Learning is not [Safe with Probability 1]”.
Doesn’t the cosmic ray example point to a strictly positive probability of dangerous behavior?
EDIT: Nvm I see what you’re saying. If I’m understanding correctly, you’d prefer, e.g. “Value Learning is not [Safe with Probability 1]”.
Thanks for the pointer to PAC-type bounds.