Once it’s out of the box, no? It doesn’t care what we’re trying to make it do if we aren’t succeeding, and we clearly aren’t once it’s escaped the box.
Your hypothetical might work in the (pretty convoluted) case that we have a superintelligence that isn’t actually aligned, but is aligned well enough that it wants to do whatever we ask it to? Then it might try to optimize what we ask it towards tasks that are more likely to be completed.
Once it’s out of the box, no? It doesn’t care what we’re trying to make it do if we aren’t succeeding, and we clearly aren’t once it’s escaped the box.
Your hypothetical might work in the (pretty convoluted) case that we have a superintelligence that isn’t actually aligned, but is aligned well enough that it wants to do whatever we ask it to? Then it might try to optimize what we ask it towards tasks that are more likely to be completed.