I’ve seen that post & discussed it on my shortform. I’m not really sure how effective something like Eliezer’s idea of “surrogate” goals there would actually be—sure, it’d help with some sign flip errors but it seems like it’d fail on others (e.g. if U = V + W, a sign error could occur in V instead of U, in which case that idea might not work.) I’m also unsure as to whether the probability is truly “very tiny” as Eliezer describes it. Human errors seem much more worrying than cosmic rays.
I’ve seen that post & discussed it on my shortform. I’m not really sure how effective something like Eliezer’s idea of “surrogate” goals there would actually be—sure, it’d help with some sign flip errors but it seems like it’d fail on others (e.g. if U = V + W, a sign error could occur in V instead of U, in which case that idea might not work.) I’m also unsure as to whether the probability is truly “very tiny” as Eliezer describes it. Human errors seem much more worrying than cosmic rays.