In practice once it’s known how to do that people will do it over and over again with successively more ambitious “limited” goals until it doesn’t work.
Another possibility is a “pivotal act”, but in practice I think a pivotal act has the following possible outcomes:
The intended shutdown of the AI after the pivotal act fails, humanity destroyed
pivotal act doesn’t work
pivotal act succeeds temporarily
but the people responsible are vilified, caught and prevented from repeating it, pivotal act undone over time
and the people responsible can repeat what they did, becoming de facto rulers of earth, but due to the lack of legitimacy face a game-theoretical landscape forcing them to act as dictators even if they have good intentions to start with
and the people responsible and sympathizers convince the general public of the legitimacy of the act (potentially leads to good outcome in this case, but beware of corrupted hardware making you overestimate the actually-low probability of this happening)
pivotal act is somehow something that has permanent effect, but the potential of humanity permanently reduced
I don’t dispute that at some point in time we want to solve alignment (to come out of the precipice period), but I disputed it’s more dangerous to know how to point AI before having solved what perfect goal to give it. In fact, I think it’s less dangerous because we at minimum gain more time, to work and solve alignment, and at best can use existing near human-level AGI to help us solve alignment too. The main reason to believe this is to reason that near human-level AGI is a particular zone where we can detect deception, where it can’t easily unbox itself and takeover, yet is still useful. The longer we stay in this zone, the more relatively safe progress we can make (including on alignment)
In fact, I think it’s less dangerous because we at minimum gain more time
As I argued above we have less time, not more, if we know how to point AI.
An AI aimed at something in particular would be much more dangerous for its level than one not aimed at any particular real-world goal, and so “near-human level AGI” would be much safer (and we can keep in the near-human level zone you mention longer) if we can’t point it.
In theory, yes, that could work.
In practice once it’s known how to do that people will do it over and over again with successively more ambitious “limited” goals until it doesn’t work.
Another possibility is a “pivotal act”, but in practice I think a pivotal act has the following possible outcomes:
The intended shutdown of the AI after the pivotal act fails, humanity destroyed
pivotal act doesn’t work
pivotal act succeeds temporarily
but the people responsible are vilified, caught and prevented from repeating it, pivotal act undone over time
and the people responsible can repeat what they did, becoming de facto rulers of earth, but due to the lack of legitimacy face a game-theoretical landscape forcing them to act as dictators even if they have good intentions to start with
and the people responsible and sympathizers convince the general public of the legitimacy of the act (potentially leads to good outcome in this case, but beware of corrupted hardware making you overestimate the actually-low probability of this happening)
pivotal act is somehow something that has permanent effect, but the potential of humanity permanently reduced
I don’t dispute that at some point in time we want to solve alignment (to come out of the precipice period), but I disputed it’s more dangerous to know how to point AI before having solved what perfect goal to give it.
In fact, I think it’s less dangerous because we at minimum gain more time, to work and solve alignment, and at best can use existing near human-level AGI to help us solve alignment too. The main reason to believe this is to reason that near human-level AGI is a particular zone where we can detect deception, where it can’t easily unbox itself and takeover, yet is still useful. The longer we stay in this zone, the more relatively safe progress we can make (including on alignment)
As I argued above we have less time, not more, if we know how to point AI.
An AI aimed at something in particular would be much more dangerous for its level than one not aimed at any particular real-world goal, and so “near-human level AGI” would be much safer (and we can keep in the near-human level zone you mention longer) if we can’t point it.