Rohin Shah comments on AI Alignment Open Thread August 2019

Rohin Shah 8 Aug 2019 22:35 UTC
LW: 2 AF: 1
0
AF
To be hopefully clear: I’m applying this normative claim to argue that proof is needed to establish the desired level of confidence.
Under my model, it’s overwhelmingly likely that regardless of what we do AGI will be deployed with less than the desired level of confidence in its alignment. If I personally controlled whether or not AGI was deployed, then I’d be extremely interested in the normative claim. If I then agreed with the normative claim, I’d agree with:
proof is needed to establish the desired level of confidence. That doesn’t mean direct proof of the claim “the AI will do good”, but rather of supporting claims, perhaps involving the learning-theoretic properties of the system (putting bounds on errors of certain kinds) and such.

I don’t see how you can be confident enough of that view for it to be how you really want to check.
If I want >99% confidence, I agree that I couldn’t be confident enough in that argument.
A system can be optimizing a fairly good proxy, so that at low levels of capability it is highly aligned, but this falls apart as the system becomes highly capable and figures out “hacks” around the “usual interpretation” of the proxy.
Yeah, the hope here would be that the relevant decision-makers are aware of this dynamic (due to previous situations in which e.g. a recommender system optimized the fairly good proxy of clickthrough rate but this lead to “hacks” around the “usual interpretation”), and have some good reason to think that it won’t happen with the highly capable system they are planning to deploy.
I also note that it seems like we disagree both about how useful proofs will be and about how useful empirical investigations will be
Agreed. It also might be that we disagree on the tractability of proofs in addition to / instead of the utility of proofs.