I’ve since edited my comment to agree with you. That said...
and the possible actions are ordered in terms of how extreme they are
That’s the friendly AI problem. Maybe it can be solved by defining a metric on the solution space and making the AI stay close to a safe point, but I don’t know how to define such a metric. Clicking a link seems like a non-extreme action. It might have extreme consequences, but that’s true for all actions. Hitler’s genetic code was affected by the flapping of a butterfly’s wings across the world.
I’ve since edited my comment to agree with you. That said...
That’s the friendly AI problem. Maybe it can be solved by defining a metric on the solution space and making the AI stay close to a safe point, but I don’t know how to define such a metric. Clicking a link seems like a non-extreme action. It might have extreme consequences, but that’s true for all actions. Hitler’s genetic code was affected by the flapping of a butterfly’s wings across the world.