For the first point, if “people can in fact recognize some types of unsafety,” then it’s not the case that “you don’t even have a clear idea of what would constitute unsafe.” And as I said in another comment, I think this is trying to argue about standards, which is a necessity in practice for companies that want to release systems, but isn’t what makes the central point, which is the title of the post, true.
Maybe I am misunderstanding what you mean by “have a clear idea of what would constitute unsafe”?
Taking rods as an example, my understanding is that rods might be used to support some massive objects, and if the rods bend under the load then they might release the objects and cause harm. So the rods need to be strong enough to support the objects, and usually rods are sold with strength guarantees to achieve this.
“If it would fail under this specific load, then it is unsafe” is a clear idea of what would constitute unsafe. I don’t think we have this clear of an idea for AI. We have some vague ideas of things that would be undesirable, but there tends to be a wide range of potential triggers and a wide range of potential outcomes, which seem more easily handled by some sort of adversarial setup than by writing down a clean logical description. But maybe when you say “clear idea”, you don’t necessarily mean a clean logical description, and also consider more vague descriptions to be relevant?
And I agree that rods are often simple, and the reason that I chose rods as an example is because people have an intuitive understanding of some of the characteristics you care about. But the same conceptual model, however, applies to cars, where there is tons of specific safety testing with clearly defined standards, despite the fact that their behavior can be very, very complex.
I already addressed cars and you said we should talk about rods. Then I addressed rods and you want to switch back to cars. Can you make up your mind?
“If it would fail under this specific load, then it is unsafe” is a clear idea of what would constitute unsafe. I don’t think we have this clear of an idea for AI.
Agreed. And so until we do, we can’t claim they are safe.
But maybe when you say “clear idea”, you don’t necessarily mean a clean logical description, and also consider more vague descriptions to be relevant?
A vague description allows for a vague idea of safety. That’s still far better than what we have now, so I’d be happier with that than the status quo—but in fact, what people outside of AI safety seem to mean by “safe” is even less specific than having an idea about what could go wrong—it’s more often “I haven’t been convinced that it’s going to fail and hurt anyone.”
I already addressed cars and you said we should talk about rods. Then I addressed rods and you want to switch back to cars. Can you make up your mind?
Both are examples. Both are examples, but useful for illustrating different things. Cars are far more complex, and less intuitive, but they still have clear safety standards for design.
Maybe I am misunderstanding what you mean by “have a clear idea of what would constitute unsafe”?
Taking rods as an example, my understanding is that rods might be used to support some massive objects, and if the rods bend under the load then they might release the objects and cause harm. So the rods need to be strong enough to support the objects, and usually rods are sold with strength guarantees to achieve this.
“If it would fail under this specific load, then it is unsafe” is a clear idea of what would constitute unsafe. I don’t think we have this clear of an idea for AI. We have some vague ideas of things that would be undesirable, but there tends to be a wide range of potential triggers and a wide range of potential outcomes, which seem more easily handled by some sort of adversarial setup than by writing down a clean logical description. But maybe when you say “clear idea”, you don’t necessarily mean a clean logical description, and also consider more vague descriptions to be relevant?
I already addressed cars and you said we should talk about rods. Then I addressed rods and you want to switch back to cars. Can you make up your mind?
Agreed. And so until we do, we can’t claim they are safe.
A vague description allows for a vague idea of safety. That’s still far better than what we have now, so I’d be happier with that than the status quo—but in fact, what people outside of AI safety seem to mean by “safe” is even less specific than having an idea about what could go wrong—it’s more often “I haven’t been convinced that it’s going to fail and hurt anyone.”
Both are examples. Both are examples, but useful for illustrating different things. Cars are far more complex, and less intuitive, but they still have clear safety standards for design.