Yes, I am inclined to agree with that take. At least, I think that’s how things will go at first. I think once a level is hit where there is clear empirical evidence of substantial immediate danger, then people will be willing to accept a higher alignment tax for the purposes of carefully researching the dangerous AI in a controlled lab. Start with high levels of noise injection and slowdown, then gradually relax these as you do continual testing. Find the sweet spot where you can be confident you are fully in control with only the minimum necessary alignment tax.
The question then, in my mind, is how much of a gap will there be between the levels of control and the levels of AI development? Will we sanely keep ahead of the curve, starting with high levels of control in initial testing then backing off gradually to a safe point? That would be the wise thing to do. Will we be correct in our judgements of what a safe level is?
Or will we act too late, deciding to increase the level of control only once an incident has occurred? The first incident could well be the last, if it is an escape of a rogue AI capable of strategic planning and self-improvement.
Yes, I am inclined to agree with that take. At least, I think that’s how things will go at first. I think once a level is hit where there is clear empirical evidence of substantial immediate danger, then people will be willing to accept a higher alignment tax for the purposes of carefully researching the dangerous AI in a controlled lab. Start with high levels of noise injection and slowdown, then gradually relax these as you do continual testing. Find the sweet spot where you can be confident you are fully in control with only the minimum necessary alignment tax.
The question then, in my mind, is how much of a gap will there be between the levels of control and the levels of AI development? Will we sanely keep ahead of the curve, starting with high levels of control in initial testing then backing off gradually to a safe point? That would be the wise thing to do. Will we be correct in our judgements of what a safe level is?
Or will we act too late, deciding to increase the level of control only once an incident has occurred? The first incident could well be the last, if it is an escape of a rogue AI capable of strategic planning and self-improvement.
see my related comment here: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech?commentId=u6W2tjuhKyJ8nCwQG