I think there are less cautious plans for containment that are more likely to be enacted, e.g., the whole “control” line of work or related network security approaches. The slow substrate plan seems to have far too high an alignment tax to be a realistic option.
Yes, I am inclined to agree with that take. At least, I think that’s how things will go at first. I think once a level is hit where there is clear empirical evidence of substantial immediate danger, then people will be willing to accept a higher alignment tax for the purposes of carefully researching the dangerous AI in a controlled lab. Start with high levels of noise injection and slowdown, then gradually relax these as you do continual testing. Find the sweet spot where you can be confident you are fully in control with only the minimum necessary alignment tax.
The question then, in my mind, is how much of a gap will there be between the levels of control and the levels of AI development? Will we sanely keep ahead of the curve, starting with high levels of control in initial testing then backing off gradually to a safe point? That would be the wise thing to do. Will we be correct in our judgements of what a safe level is?
Or will we act too late, deciding to increase the level of control only once an incident has occurred? The first incident could well be the last, if it is an escape of a rogue AI capable of strategic planning and self-improvement.
For some examples of why it makes sense to think that potent AI could be safely studied in the lab, see this comment and the post it is in relation to: https://www.lesswrong.com/posts/qhhRwxsef7P2yC2Do/ai-alignment-via-slow-substrates-early-empirical-results?commentId=eM7b9QxJSsFn28opC
I think there are less cautious plans for containment that are more likely to be enacted, e.g., the whole “control” line of work or related network security approaches. The slow substrate plan seems to have far too high an alignment tax to be a realistic option.
Yes, I am inclined to agree with that take. At least, I think that’s how things will go at first. I think once a level is hit where there is clear empirical evidence of substantial immediate danger, then people will be willing to accept a higher alignment tax for the purposes of carefully researching the dangerous AI in a controlled lab. Start with high levels of noise injection and slowdown, then gradually relax these as you do continual testing. Find the sweet spot where you can be confident you are fully in control with only the minimum necessary alignment tax.
The question then, in my mind, is how much of a gap will there be between the levels of control and the levels of AI development? Will we sanely keep ahead of the curve, starting with high levels of control in initial testing then backing off gradually to a safe point? That would be the wise thing to do. Will we be correct in our judgements of what a safe level is?
Or will we act too late, deciding to increase the level of control only once an incident has occurred? The first incident could well be the last, if it is an escape of a rogue AI capable of strategic planning and self-improvement.
see my related comment here: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech?commentId=u6W2tjuhKyJ8nCwQG