Daniel Kokotajlo comments on My research methodology

Daniel Kokotajlo 24 Mar 2021 21:00 UTC
LW: 7 AF: 6
AF
Thinking about politics may not be a failure mode; my question was whether it feels “extreme and somewhat strange,” sorry for not clarifying. Like, suppose for some reason “doesn’t think about politics” was on your list of desiderata for the extremely powerful AI you are building. So thinking about politics would in that case be a failure mode. Would it be an extreme and somewhat strange one?
I’d be interested to hear more about the law-breaking stuff—what is it about some laws that makes AI breaking them unsurprising/normal/hard-to-avoid, whereas for others AI breaking them is perverse/strange/avoidable?
I wasn’t constructing a reductio, just explaining why the phrase didn’t help me understand your view/intuition. When I hear that phrase, it seems to me to apply equally to the grenade case, the lion-bites-head-off case, the AI-is-egregiously-misaligned case, etc. All of those cases feel the same to me.
(I do notice a difference between these cases and the bridge case. With the bridge, there’s some sense in which no way you could have made the bridge would be good enough to prevent a certain sufficiently heavy load. By contrast, with AI, lions, and rocket-armchairs, there’s at least some possible way to handle it well besides “just don’t do it in the first place.” Is this the distinction you are talking about?)
Is your claim just that the solubility of the alignment problem is not empirically contingent, i.e. there is no possible world (no set of laws of physics and initial conditions) such that someone like us builds some sort of super-smart AI, and it becomes egregiously misaligned, and there was no way for them to have built the AI without it becoming egregiously misaligned?