In addition to Daniel’s point, I think an important piece is probabilistic thinking—the AGI will execute not based on what will happen but on what it expects to happen. What probability is acceptable? If none, it should do nothing.
I don’t think this is an important obstacle — you could use something like “and act such that your P(your actions over the next year lead to a massive disaster) < 10^-10.” I think Daniel’s point is the heart of the issue.
In addition to Daniel’s point, I think an important piece is probabilistic thinking—the AGI will execute not based on what will happen but on what it expects to happen. What probability is acceptable? If none, it should do nothing.
I don’t think this is an important obstacle — you could use something like “and act such that your P(your actions over the next year lead to a massive disaster) < 10^-10.” I think Daniel’s point is the heart of the issue.
I think that incentivizes self-deception on probabilities. Also, P <10^-10 are pretty unusual, so I’d expect that to cause very little to happen.