I don’t dispute that at some point in time we want to solve alignment (to come out of the precipice period), but I disputed it’s more dangerous to know how to point AI before having solved what perfect goal to give it. In fact, I think it’s less dangerous because we at minimum gain more time, to work and solve alignment, and at best can use existing near human-level AGI to help us solve alignment too. The main reason to believe this is to reason that near human-level AGI is a particular zone where we can detect deception, where it can’t easily unbox itself and takeover, yet is still useful. The longer we stay in this zone, the more relatively safe progress we can make (including on alignment)
In fact, I think it’s less dangerous because we at minimum gain more time
As I argued above we have less time, not more, if we know how to point AI.
An AI aimed at something in particular would be much more dangerous for its level than one not aimed at any particular real-world goal, and so “near-human level AGI” would be much safer (and we can keep in the near-human level zone you mention longer) if we can’t point it.
I don’t dispute that at some point in time we want to solve alignment (to come out of the precipice period), but I disputed it’s more dangerous to know how to point AI before having solved what perfect goal to give it.
In fact, I think it’s less dangerous because we at minimum gain more time, to work and solve alignment, and at best can use existing near human-level AGI to help us solve alignment too. The main reason to believe this is to reason that near human-level AGI is a particular zone where we can detect deception, where it can’t easily unbox itself and takeover, yet is still useful. The longer we stay in this zone, the more relatively safe progress we can make (including on alignment)
As I argued above we have less time, not more, if we know how to point AI.
An AI aimed at something in particular would be much more dangerous for its level than one not aimed at any particular real-world goal, and so “near-human level AGI” would be much safer (and we can keep in the near-human level zone you mention longer) if we can’t point it.