I don’t think it’s true that the safety of a thing depends on an explicit standard. There’s no explicit standard for whether a grizzly bear is safe. There are only guidelines about how best to interact with them, and information about how grizzly bears typically act. I don’t think this implies that it’s incoherent to talk about the situations in which a grizzly bear is safe.
Similarly, if I make a simple html web site “without a clear indication about what the system can safely be used for… verification that it passed a relevant standard, and clear instruction that it cannot be used elsewhere”, I don’t think that’s sufficient for it to be considered unsafe.
Sometimes a thing will reliably cause serious harm to people who interact with it. It seems to me that this is sufficient for it to be called unsafe. Sometimes a thing will reliably cause no harm, and that seems sufficient for it to be considered safe. Knowledge of whether a thing is safe or not is a different question, and there are edge cases where a thing might occasionally cause minor harm. But I think the requirement you lay out is too stringent.
I think you’re focusing on the idea of a standard, which is necessary for a production system or reliability in many senses, and should be demanded of AI companies—but it is not the fundamental issue with not being able to say in any sense what makes the system safe or unsafe, which was the fundamental point here that you seem not to disagree with.
I’m not laying out a requirement, I’m pointing out a logical necessity; if you don’t know what something is or is not, you can’t determine it. But if something “will reliably cause serious harm to people who interact with it,” it sounds like you have a very clear understanding of how it would be unsafe, and a way to check whether that occurs.
Part of my point is that there is a difference between the fact of the matter and what we know. Some things are safe despite our ignorance, and some are unsafe despite our ignorance.
Sure, I agree with that, and so perhaps the title should have been “Systems that cannot be reasonably claimed to be unsafe in specific ways cannot be claimed to be safe in those ways, because what does that even mean?”
If you say something is “qwrgz,” I can’t agree or disagree, I can only ask what you mean. If you say something is “safe,” I generally assume you are making a claim about something you know. My problem is that people claim that something is safe, despite not having stated any idea about what they would call unsafe. But again, that seems fundamentally confused about what safety means for such systems.
People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don’t when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I’m pretty sure almost everyone would agree that it’s unsafe. No further specification work is required. It doesn’t seem fundamentally confused to refer to a thing as “unsafe” if you think it might do that.
I do think that some people are clearly talking about meanings of the word “safe” that aren’t so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to “meaningless”.
I do think that some people are clearly talking about meanings of the word “safe” that aren’t so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to “meaningless”.
The people in the world who actually build these models are doing the thing that I pointed out. That’s the issue I was addressing.
People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don’t when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I’m pretty sure almost everyone would agree that it’s unsafe. No further specification work is required. It doesn’t seem fundamentally confused to refer to a thing as “unsafe” if you think it might do that.
I don’t understand this distinction. If ” I’m pretty sure almost everyone would agree that it’s unsafe,” that’s an informal but concrete ability for the system to be unsafe, and it would not be confused to say something is unsafe if you think it could do that, nor to claim that it is safe if you have clear reason to believe it will not.
My problem is, as you mentioned, that people in the world of ML are not making that class of claim. They don’t seem to ground their claims about safety in any conceptual model about what the risks or possible failures are whatsoever, and that does seem fundamentally confused.
That’s true informally, and maybe it is what some consumers have in mind, but that is not what the people who are responsible for actual load-bearing safety are meaning.
The issue is that the standards are meant to help achieve systems that are safe in the informal sense. If they don’t, they’re bad standards. How can you talk about whether a standard is sufficient, if it’s incoherent to discuss whether layperson-unsafe systems can pass it?
True, but the informal safety standard is “what doesn’t harm humans.” For construction, it amounts to “doesn’t collapse,” which you can break down into things like “strength of beam.” But with AI you are talking to the full generality of language and communication and that effectively means: “All types of harm.” Which is exactly the very difficult thing to get right here.
For construction, it amounts to “doesn’t collapse,”
No, the risk and safety models for construction go far, far beyond that, from radon and air quality to size and accessibility of fire exits.
with AI you are talking to the full generality of language and communication and that effectively means: “All types of harm.”
Yes, so it’s a harder problem to claim that it’s safe. But doing nothing, having no risk model at all, and claiming that there’s no reason to think it’s unsafe, so it is safe, is, as I said, “fundamentally confused about what safety means for such systems.”
I don’t think it’s true that the safety of a thing depends on an explicit standard. There’s no explicit standard for whether a grizzly bear is safe. There are only guidelines about how best to interact with them, and information about how grizzly bears typically act. I don’t think this implies that it’s incoherent to talk about the situations in which a grizzly bear is safe.
Similarly, if I make a simple html web site “without a clear indication about what the system can safely be used for… verification that it passed a relevant standard, and clear instruction that it cannot be used elsewhere”, I don’t think that’s sufficient for it to be considered unsafe.
Sometimes a thing will reliably cause serious harm to people who interact with it. It seems to me that this is sufficient for it to be called unsafe. Sometimes a thing will reliably cause no harm, and that seems sufficient for it to be considered safe. Knowledge of whether a thing is safe or not is a different question, and there are edge cases where a thing might occasionally cause minor harm. But I think the requirement you lay out is too stringent.
I think you’re focusing on the idea of a standard, which is necessary for a production system or reliability in many senses, and should be demanded of AI companies—but it is not the fundamental issue with not being able to say in any sense what makes the system safe or unsafe, which was the fundamental point here that you seem not to disagree with.
I’m not laying out a requirement, I’m pointing out a logical necessity; if you don’t know what something is or is not, you can’t determine it. But if something “will reliably cause serious harm to people who interact with it,” it sounds like you have a very clear understanding of how it would be unsafe, and a way to check whether that occurs.
Part of my point is that there is a difference between the fact of the matter and what we know. Some things are safe despite our ignorance, and some are unsafe despite our ignorance.
Sure, I agree with that, and so perhaps the title should have been “Systems that cannot be reasonably claimed to be unsafe in specific ways cannot be claimed to be safe in those ways, because what does that even mean?”
If you say something is “qwrgz,” I can’t agree or disagree, I can only ask what you mean. If you say something is “safe,” I generally assume you are making a claim about something you know. My problem is that people claim that something is safe, despite not having stated any idea about what they would call unsafe. But again, that seems fundamentally confused about what safety means for such systems.
I would agree more with your rephrased title.
People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don’t when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I’m pretty sure almost everyone would agree that it’s unsafe. No further specification work is required. It doesn’t seem fundamentally confused to refer to a thing as “unsafe” if you think it might do that.
I do think that some people are clearly talking about meanings of the word “safe” that aren’t so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to “meaningless”.
The people in the world who actually build these models are doing the thing that I pointed out. That’s the issue I was addressing.
I don’t understand this distinction. If ” I’m pretty sure almost everyone would agree that it’s unsafe,” that’s an informal but concrete ability for the system to be unsafe, and it would not be confused to say something is unsafe if you think it could do that, nor to claim that it is safe if you have clear reason to believe it will not.
My problem is, as you mentioned, that people in the world of ML are not making that class of claim. They don’t seem to ground their claims about safety in any conceptual model about what the risks or possible failures are whatsoever, and that does seem fundamentally confused.
That’s true informally, and maybe it is what some consumers have in mind, but that is not what the people who are responsible for actual load-bearing safety are meaning.
The issue is that the standards are meant to help achieve systems that are safe in the informal sense. If they don’t, they’re bad standards. How can you talk about whether a standard is sufficient, if it’s incoherent to discuss whether layperson-unsafe systems can pass it?
True, but the informal safety standard is “what doesn’t harm humans.” For construction, it amounts to “doesn’t collapse,” which you can break down into things like “strength of beam.” But with AI you are talking to the full generality of language and communication and that effectively means: “All types of harm.” Which is exactly the very difficult thing to get right here.
No, the risk and safety models for construction go far, far beyond that, from radon and air quality to size and accessibility of fire exits.
Yes, so it’s a harder problem to claim that it’s safe. But doing nothing, having no risk model at all, and claiming that there’s no reason to think it’s unsafe, so it is safe, is, as I said, “fundamentally confused about what safety means for such systems.”
I get that, but I tried to phrase that in terms that connected to benwr’s reques.