If you have feedback for me, you can fill out the form at https://forms.gle/kVk74rqzfMh4Q2SM6 .
Or you can email me, at [the second letter of the alphabet]@[my username].net
If you have feedback for me, you can fill out the form at https://forms.gle/kVk74rqzfMh4Q2SM6 .
Or you can email me, at [the second letter of the alphabet]@[my username].net
For contingent evolutionary-psychological reasons, humans are innately biased to prefer “their own” ideas, and in that context, a “principle of charity” can be useful as a corrective heuristic
I claim that the reasons for this bias are, in an important sense, not contingent. i.e. an alien race would almost certainly have similar biases, and the forces in favor of this bias won’t entirely disappear in a world with magically-different discourse norms (at least as long as speakers’ identities are attached to their statements).
As soon as I’ve said “P”, it is the case that my epistemic reputation is bound up with the group’s belief in the truth of P. If people later come to believe P, it means that (a) whatever scoring rule we’re using to incentivize good predictions in the first place will reward me, and (b) people will update more on things I say in the future.
If you wanted to find convincing evidence for P, I’m now a much better candidate to find that evidence than someone who has instead said “eh; maybe P?” And someone who has said “~P” is similarly well-incentivized to find evidence for ~P.
I would agree more with your rephrased title.
People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don’t when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I’m pretty sure almost everyone would agree that it’s unsafe. No further specification work is required. It doesn’t seem fundamentally confused to refer to a thing as “unsafe” if you think it might do that.
I do think that some people are clearly talking about meanings of the word “safe” that aren’t so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to “meaningless”.
Part of my point is that there is a difference between the fact of the matter and what we know. Some things are safe despite our ignorance, and some are unsafe despite our ignorance.
The issue is that the standards are meant to help achieve systems that are safe in the informal sense. If they don’t, they’re bad standards. How can you talk about whether a standard is sufficient, if it’s incoherent to discuss whether layperson-unsafe systems can pass it?
I don’t think it’s true that the safety of a thing depends on an explicit standard. There’s no explicit standard for whether a grizzly bear is safe. There are only guidelines about how best to interact with them, and information about how grizzly bears typically act. I don’t think this implies that it’s incoherent to talk about the situations in which a grizzly bear is safe.
Similarly, if I make a simple html web site “without a clear indication about what the system can safely be used for… verification that it passed a relevant standard, and clear instruction that it cannot be used elsewhere”, I don’t think that’s sufficient for it to be considered unsafe.
Sometimes a thing will reliably cause serious harm to people who interact with it. It seems to me that this is sufficient for it to be called unsafe. Sometimes a thing will reliably cause no harm, and that seems sufficient for it to be considered safe. Knowledge of whether a thing is safe or not is a different question, and there are edge cases where a thing might occasionally cause minor harm. But I think the requirement you lay out is too stringent.
I think I agree that this isn’t a good explicit rule of thumb, and I somewhat regret how I put this.
But it’s also true that a belief in someone’s good-faith engagement (including an onlooker’s), and in particular their openness to honest reconsideration, is an important factor in the motivational calculus, and for good reasons.
I think it’s pretty rough for me to engage with you here, because you seem to be consistently failing to read the things I’ve written. I did not say it was low-effort. I said that it was possible. Separately, you seem to think that I owe you something that I just definitely do not owe you. For the moment, I don’t care whether you think I’m arguing in bad faith; at least I’m reading what you’ve written.
Nor should I, unless I believe that someone somewhere might honestly reconsider their position based on such an attempt. So far my guess is that you’re not saying that you expect to honestly reconsider your position, and Said certainly isn’t. If that’s wrong then let me know! I don’t make a habit of starting doomed projects.
I’m not sure what you mean—as far as I can tell, I’m the one who suggested trying to rephrase the insulting comment, and in my world Said roughly agreed with me about its infeasibility in his response, since it’s not going to be possible for me to prove either point: Any rephrasing I give will elicit objections on both semantics-relative-to-Said and Said-generatability grounds, and readers who believe Said will go on believing him, while readers who disbelieve will go on disbelieving.
By that measure, my comment does not qualify as an insult. (And indeed, as it happens, I wouldn’t call it “an insult”; but “insulting” is slightly different in connotation, I think. Either way, I don’t think that my comment may fairly be said to have these qualities which you list.
I think I disagree that your comment does not have these qualities in some measure, and they are roughly what I’m objecting to when I ask that people not be insulting. I don’t think I want you to never say anything with an unflattering implication, though I do think this is usually best avoided as well. I’m hopeful that this is a crux, as it might explain some of the other conversation I’ve seen about the extent to which you can predict people’s perception of rudeness.
There are of course more insulting ways you could have conveyed the same meaning. But there are also less insulting ways (when considering the extent to which the comment emphasizes the unflatteringness and the call to action that I’m suggesting readers will infer).
Certainly there’s no “call to non-belief-based action”…!)
I believe that none was intended, but I also expect that people (mostly subconsciously!) interpret (a very small) one from the particular choice of words and phrasing. Where the action is something like “you should scorn this person”, and not just “this person has unflattering quality X”. The latter does not imply the former.
For what it’s worth, I don’t think that one should never say insulting things. I think that people should avoid saying insulting things in certain contexts, and that LessWrong comments are one such context.
I find it hard to square your claim that insultingness was not the comment’s purpose with the claim that it cannot be rewritten to elide the insult.
An insult is not simply a statement with a meaning that is unflattering to its target—it involves using words in a way that aggressively emphasizes the unflatteringness and suggests, to some extent, a call to non-belief-based action on the part of the reader.
If I write a comment entirely in bold, in some sense I cannot un-bold it without changing its effect on the reader. But I think it would be pretty frustrating to most people if I then claimed that I could not un-bold it without changing its meaning.
My guess is that you believe it’s impossible because the content of your comment implies a negative fact about the person you’re responding to. But insofar as you communicated a thing to me, it was in fact a thing about your own failure to comprehend, and your own experience of bizarreness. These are not unflattering facts about Duncan, except insofar as I already believe your ability to comprehend is vast enough to contain all “reasonable” thought processes.
But, of course, I recognize that my comment is insulting. That is not its purpose, and if I could write it non-insultingly, I would do so. But I cannot.
I want to register that I don’t believe you that you cannot, if we’re using the ordinary meaning of “cannot”. I believe that it would be more costly for you, but it seems to me that people are very often able to express content like that in your comment, without being insulting.
I’m tempted to try to rephrase your comment in a non-insulting way, but I would only be able to convey its meaning-to-me, and I predict that this is different enough from its meaning-to-you that you would object on those grounds. However, insofar as you communicated a thing to me, you could have said that thing in a non-insulting way.
Other facts about how I experience this:
* It’s often opposed to internal forces like “social pressure to believe the thing”, or “bucket errors I don’t feel ready to stop making yet”
* Noticing it doesn’t usually result in immediate enlightenment / immediately knowing the answer, but it does result in some kind of mini-catharsis, which is great because it helps me actually want to notice it more.
* It’s not always the case that an opposing loud voice was wrong, but I think it is always the case that the loud voice wasn’t really justified in its loudness.
A thing I sort-of hoped to see in the “a few caveats” section:
* People’s boundaries do not emanate purely from their platonic selves, irrespective of the culture they’re in and the boundaries set by that culture. Related to the point about grooming/testing-the-waters, if the cultural boundary is set at a given place, people’s personal boundaries will often expand or retract somewhat, to be nearer to the cultural boundary.
Perhaps controversially, I think this is a bad selection scheme even if you replace “password” with any other string.
any password generation scheme where this is relevant is a bad idea
I disagree; as the post mentions, sometimes considerations such as memorability come into play. One example might be choosing random English sentences as passwords. You might do that by choosing a random parse tree of a certain size. But some English sentences have ambiguous parses, i.e. they’ll have multiple ways to generate them. You *could* try to sample to avoid this problem, but it becomes pretty tricky to do that carefully. If you instead find the “most ambiguous sentence” in your set, you can get a lower bound on the safety of your scheme.
Um, huh? There are 2^1000 1000-character passwords, not 2^4700. Where is the 4700 coming from?
(added after realizing the above was super wrong): Whoops, that’s what I get for looking at comments first thing in the morning. log2(26^1000) = 4700 Still, the following bit stands:
I’d also like to register that, in my opinion, if it turns out that your comment is wrong and not my original statement, it’s really bad manners to have said it so confidently.
(I’m now not sure if you made an error or if I did, though)
Update: I think you’re actually totally right. The entropy gives a lower bound for the average, not the average itself. I’ll update the post shortly.
To clarify a point in my sibling comment, the concept of “password strength” doesn’t cleanly apply to an individual password. It’s too contingent on factors that aren’t within the password itself. Say I had some way of scoring passwords on their strength, and that this scoring method tells me that “correct horse battery staple” is a great password. But then some guy puts that password in a webcomic read by millions of people—now my password is going to be a lot worse, even though the content of the password didn’t change.
Password selection schemes aren’t susceptible to this kind of problem, and you can consistently compare the strength of one with the strength of another, using methods like the ones I’m talking about in the OP.
(I’ve added my $50 to RatsWrong’s side of this bet)