Meta-Honesty: Firming Up Honesty Around Its Edge-Cases
(Cross-posted from Facebook.)
0: Tl;dr.
A problem with the obvious-seeming “wizard’s code of honesty” aka “never say things that are false” is that it draws on high verbal intelligence and unusually permissive social embeddings. I.e., you can’t always say “Fine” to “How are you?” This has always made me feel very uncomfortable about the privilege implicit in recommending that anyone else be more honest.
Genuinely consistent Glomarization (i.e., consistently saying “I cannot confirm or deny” whether or not there’s anything to conceal) does not work in principle because there are too many counterfactual selves who might want to conceal something.
Glomarization also doesn’t work in practice if the Nazis show up at your door asking if you have fugitive Jews in your attic.
If you would lie to Nazis about fugitive Jews, then absolute truthsaying can’t be the whole story, which makes “never say things that are false” feel to me like a shaky foundation in that it is literally false, and something less shaky would be nice.
Robin Hanson’s “automatic norms” problem suggests different people might have very different ideas about what constitutes a good person’s normal honesty, without realizing that they have very different ideas. Perceived violations of an honesty norm can blow up and cause interpersonal conflict. It seems to me that this is something that doesn’t always work well when people leave it alone.
A rule which seems to me more “normal” than the wizard’s literal-truth rule, more like a version of standard human honesty reinforced around the edges, would be as follows:
“Don’t lie when a normal highly honest person wouldn’t, and furthermore, be honest when somebody asks you which hypothetical circumstances would cause you to lie or mislead—absolutely honest, if they ask under this code. However, questions about meta-honesty should be careful not to probe object-level information.”
I’ve been tentatively calling this “meta-honesty”, but better terminology is solicited.
1: Glomarization can’t practically cover many cases.
Suppose that last night I helped hide a fugitive marijuana seller from the Feds. You ask me what I was doing last night, and I, preferring not to emit false statements, reply, “I can’t confirm or deny what I was doing last night.”
We now have two major problems here:
Even on an ordinary day, if you casually ask me what I was doing last night, I theoretically ought to answer “I can’t confirm or deny what I was doing last night” because some of my counterfactual selves were hiding fugitive marijuana sellers from the Feds. If I don’t do this consistently, and I actually was hiding fugitives last night, I can’t Glomarize without revealing information. But then the number of counterfactuals I have to worry about is too large for me to ever answer anything.
If the Feds actually ask you this question, they will not be familiar with your previous practice of Glomarization and will probably not be very impressed with your answer.
This doesn’t mean that Glomarization is never helpful. If you ask me whether my submarine is carrying nuclear weapons, or whether I’m secretly the author of “The Waves Arisen”, I think most listeners would understand if I replied, “I have a consistent policy of not saying which submarines are carrying nuclear weapons, nor whether I wrote or helped write a document that doesn’t have my name on it.” An ordinary honest person does not need to lie on these occasions because Glomarization is both theoretically possible and pragmatically practical, so one should adopt a consistent Glomarization rather than lie.
But that doesn’t work for hiding fugitives. Or any other occasion where an ordinary high-honesty person would consider it obligatory to lie, in answer to a question where the asker is not expecting evasion or Glomarization.
(I’m sure some people reading this think it’s all very cute for me to be worried about the fact that I wouldn’t tell the truth all the time. Feel free to state this in the comments so that we aren’t confused about who’s using which norms. Smirking about it, or laughing, especially conveys important info about you.)
2: The law of no literal falsehood.
One formulation of my automatic norm for honesty, the one that feels like the obvious default from which any departure requires a crushingly heavy justification, was given by Ursula K. LeGuin in A Wizard of Earthsea:
He told his tale, and one man said, “But who saw this wonder of dragons slain and dragons baffled? What if he—”
″Be still!” the Head Isle-Man said roughly, for he knew, as did most of them, that a wizard may have subtle ways of telling the truth, and may keep the truth to himself, but that if he says a thing the thing is as he says. For that is his mastery.
Or in simpler summary, this policy says:
Don’t say things that are literally false.
Or with some of the unspoken finicky details added back in: “Don’t say things that you believe to be literally false in a context where people will (with reasonably high probability) persistently believe that you believe them to be true.” Jokes are still allowed, even jokes that only get revealed as jokes ten seconds later. Or quotations, etcetera ad obviousum.
The no-literal-falsehood code of honesty has three huge advantages:
To the extent people observe you to consistently practice it, it is easier for you to communicate believably when you want to say a thing. They may still not be able to trust you perfectly, but the hypothetical is “Did this person break their big-deal code of honesty?” rather than “Did this person tell an ordinary lie?” One would hope this would be good for coordination and other interpersonal issues, though this might only be a fond wish on my part.
Most people, even most unusually honest people, wander about their lives in a fog of internal distortions of reality. Repeatedly asking yourself of every sentence you say aloud to another person, “Is this statement actually and literally true?”, helps you build a skill for navigating out of your internal smog of not-quite-truths. For that is our mastery.
It’s good for your soul. At least, it’s good for my soul for reasons I’d expect to generalize if I’m not just committing the typical-mind fallacy.
From Frank Hebert’s Dune Messiah, writing about Truthsayers, people who had trained to extreme heights the ability to tell when others were lying and who also never lied themselves:
“It requires that you have an inner agreement with truth which allows ready recognition.”
This is probably not true in normal human practice for detecting other people’s lies. I’d expect a lot of con artists are better than a lot of honest people at that.
But the phrase “It requires you have an inner agreement with truth which allows ready recognition” is something that resonates strongly with me. It feels like it points to the part that’s good for your soul. Saying only true things is a kind of respect for the truth, a pact that you forge with it.
3: The privilege of truthtelling.
I’ve never suggested to anyone else that they adopt the wizard’s code of honesty.
The code of literal truth only lets people navigate anything like ordinary social reality to the extent that they are very fast on their verbal feet, and can respond to the question “How are you?” by saying “Getting along” instead of “Horribly” or with an awkward silence while they try to think of something technically true. (Because often “I’m fine” is false, you see. If this has never bothered you then you are perhaps not in the target audience for this essay.)
So I haven’t advocated any particular code of honesty before now. I was aware of the fact that I had an unusually high verbal SAT score, and also, that I spend little time interfacing with mundanes and am not dependent on them for my daily bread. I thought it wasn’t my place for me to suggest to anyone else that they try their hand at saying only true things all the time, or for me to act like this conveys moral virtue. I’m only even describing the wizard’s code publicly now that I can think of at least one alternative.
I once heard somebody claim that rationalists ought to practice lying, so that they could separate their internal honesty from any fears of needing to say what they believed. That is, if they became good at lying, they’d feel freer to consider geocentrism without worrying what the Church would think about it. I do not in fact think this would be good for the soul, or for a cooperative spirit between people. This is the sort of proposed solution of which I say, “That is a terrible solution and there has to be a better way.”
But I do see the problem that person was trying to solve. One can also be privileged in stubbornness when it comes to overriding the fear of other people finding out what you believe. I can see how telling fewer routine lies than usual would make that fear even worse, exacerbating the pressure it can place on what you believe you believe; especially if you didn’t have a lot of confidence in your verbal agility. It’s one more reason not to pressure people (even a little) into adopting the wizard’s code, but then it would be nice to have some other code instead.
4: Literal-truth as my automatic norm, maybe not shared.
This set of thoughts started, as so many things do, with a post by Robin Hanson.
In particular Robin tweeted the paper: “The surprising costs of silence: Asymmetric preferences for prosocial lies of commission and omission.”
Abstract: Across 7 experiments (N = 3883), we demonstrate that communicators and targets make egocentric moral judgments of deception. Specifically, communicators focus more on the costs of deception to them—for example, the guilt they feel when they break a moral rule—whereas targets focus more on whether deception helps or harms them. As a result, communicators and targets make asymmetric judgments of prosocial lies of commission and omission: Communicators often believe that omitting information is more ethical than telling a prosocial lie, whereas targets often believe the opposite.
This got me wondering whether my default norm of the wizard’s code is something other people will even perceive as prosocial. Yes, indeed, I feel like not saying things is much more law-abiding than telling literal falsehoods. But if people feel just as wounded, or more wounded, then that policy isn’t really benefiting anyone else. It’s just letting me feel ethical and maybe being good for my own personal soul.
Robin commented, “Mention all relevant issues, even if you have to lie about them.”
I don’t think this is a bullet I can bite in daily practice. I think I still want to emit literal truths for most dilemmas short of hiding fugitives. But it’s one more argument worth mentioning against trying to make an absolute wizard’s code into a bedrock solution for interpersonal reliability.
Robin also published a blog post about “automatic norms” in general:
We are to just know easily and surely which actions violate norms, without needing to reflect on or discuss the matter. We are to presume that framing effects are unimportant, and that everyone agrees on the relevant norms and how they are to be applied.
In a relatively simple world with limited sets of actions and norms, and a small set of people who grew up together and later often enough observe and gossip about possible norm violations of others, such people might in fact learn from enough examples to mostly apply the same norms the same way. This was plausibly the case for most of our distant ancestors. They could in fact mostly be sure that, if they judged themselves as innocent, most everyone else would agree. And if they judged someone else as guilty, others should agree with that as well. Norm application could in fact usually be obvious and automatic.
Today however, there are far more people, and more intermixed, who grow up in widely varying contexts and now face far larger spaces of possible actions and action contexts. Relative to this huge space, gossip about particular norm violations is small and fragmented...
We must see ourselves as tolerating a lot of norm violation. We actually tell others about and attempt to punish socially only a tiny fraction of the violations that we could know of. When we look most anywhere at behavior details, it must seem to us like we are living in a Sodom and Gomorrah of sin. Compared to the ancient world, it must seem a lot easier to get away for a long time with a lot of norm violations...
We must also see ourselves as tolerating a lot of overeager busybodies applying what they see as norms to what we see as our own private business where their social norms shouldn’t apply.
This made me realize that the wizard’s code of honesty I grew up with is, indeed, an automatic norm for me. Which meant I was probably overestimating and eliezeromorphizing the degree to which other people even cared at all, or would think I was keeping any promises by doing it. Again, I don’t see this as a good reason to give up on emitting literally true sentences almost all of the time, but it’s one more reason I feel more open to alternatives than I would’ve ten years ago. That said, I do expect a lot of people reading this also have something like that same automatic norm, and I still feel like that makes us more like part of the same tribe.
5: Counterargument: The problem of non-absolute rules.
A proposal like this one ought to come with a lot of warning signs attached. Here’s one of them:
There’s a passage in John M. Ford’s Web of Angels, when the protagonist has finally killed someone even after all the times his mentor taught him to never ever kill. His mentor says:
“No words can prevent all killing. Words are not iron bands. But I taught you to hesitate, to stay your hands until the weight of duty crushed them down.”
Surprise! Really the mentor just meant to try to get him to wait before killing people instead of jumping to that right away.
Humans are kind of insane, and there are all sorts of insane institutions that have evolved among us. A fairly large number of those institutions are twisted up in such a way that something explodes if people try to talk openly about how they work.
It’s a human kind of thinking to verbally insist that “Don’t kill” is an absolute rule, why, it’s right up there in the Ten Commandments. Except that what soldiers do doesn’t count, at least if they’re on the right side of the war. And sure, it’s also okay to kill a crazy person with a gun who’s in the middle of shooting up a school, because that’s just not what the absolute law “Don’t kill” means, you know!
Why? Because any rule that’s not labeled “absolute, no exceptions” lacks weight in people’s minds. So you have to perform that the “Don’t kill” commandment is absolute and exceptionless (even though it totally isn’t), because that’s what it takes to get people to even hesitate. To stay their hands at least until the weight of duty is crushing them down. A rule that isn’t even absolute? People just disregard that whenever.
(I speculate this may have to do with how the human mind reuses physical ontology for moral ontology. I speculate that brains started with an ontology for material possibility and impossibility, and reused that ontology for morality; and it internally feels like only the moral reuse of “impossible” is a rigid moral law, while anything short of “moral-impossible” is more like a guideline. Kind of like how, if something isn’t absolutely certain, people think that means it’s okay to make up their own opinion about it, because if it’s not absolutely certain it must not be the domain of Authority. But I digress, and it’s just a hypothesis. We don’t need to know exactly what is the buried cause of the surface craziness to observe that the craziness is in fact there.)
So you have to perform that the Law is absolute in order to make the actual flexible Law exist. That doesn’t mean people lie about how the Law applies to the edge cases—that’s not what I mean to convey by the notion of “performing” a statement. More like, proclaim the Law is absolute and then just not talk about anything that contradicts the absoluteness.
And when that happens, it’s one more little chunk of insanity that nobody can talk about on the meta-level without it exploding.
Now, you will note that I am going ahead and writing this all down explicitly, because… well, because I expect that in the long run we have to find a way that doesn’t require a little knot of madness that nobody is allowed to describe faithfully on the meta-level. So we might as well start today.
I trust that you, the reader, will be able to understand that “Don’t kill” is the kind of rule where you give it enough force-as-though-of-absoluteness that it actually takes a deontology-breaking weight of duty to crush down your hands, as opposed to you cheerfully going “oh well I guess there’s a crushing weight now! let’s go!” at the first sign of inconvenience.
Actually, I don’t trust that everyone reading this can do that. That’s not even close to literally true. But most you won’t ever be called on to kill, and society frowns upon that strongly enough to discourage you anyway. So I did feel it was worth the risk to write that example explicitly.
“Don’t lie” is more dangerous to mess with. That’s something that most people don’t take as an exceptionless absolute to begin with, even in the sense of performing its absoluteless so that it will exist at all. Even extremely honest people will agree that you can lie to the Gestapo about whether you are hiding any Jews in the attic, and not bother to Glomarize your response either; and I think they will mostly agree that this is in fact a “lie” rather than trying to dance around the subject. People who are less than extremely honest think that “I’m fine” is an okay way to answer “How are you?” even if you’re not fine.
So there’s still a very obvious thing that could go wrong in people’s heads, a very obvious way that the notion of “meta-honesty” could blow up, or any other codebesides “don’t say false things” could blow up. It’s why the very first description in the opening paragraphs says “Don’t lie when a normal highly honest person wouldn’t, and furthermore…” and you should never omit that preamble if you post any discussion of this on your own blog. THIS IS NOT THE IDEA THAT IT’S OKAY TO LIE SO LONG AS YOU ARE HONEST ABOUT WHEN YOU WOULD LIE IF ANYONE ASKS. It’s not an escape hatch.
If anything, meta-honesty is the idea that you should be careful enough about when you break the rule “Don’t lie” that, if somebody else asked the hypothetical question, you would be willing to PUBLICLY DEFEND EVERY ONE OF THOSE EXTRAORDINARY EXCEPTIONS as times when even an unusually honest person should lie.
(Unless you were never claiming to be unusually honest, and your pattern of meta-honest responses to hypotheticals openly shows that you lie about as much as an average person. But even here, I’d worry that anyone who lets themselves be as wicked as they imagine the ‘average’ person to be, would be an unusually wicked person indeed. After all, if Robin Hanson speaks true, we are constantly surrounded by people violating what seem to us like automatic norms.)
6: Meta-honesty, the basics.
Okay, enough preamble, let’s speak of the details of meta-honesty, which may or may not be a terrible idea to even talk about, we don’t know at this point.
The basic formulation of meta-honesty would be:
“Be at least as honest as an unusually honest person. Furthermore, when somebody asks for it and especially when you believe they’re asking for it under this code, try to convey to them a frank and accurate picture of the sort of circumstances under which you would lie. Literally never swear by your meta-honesty that you wouldn’t lie about a hypothetical situation that you would in fact lie about.”
My first horrible terminology for this was the “Bayesian code of honesty”, on the theory that this code meant your sentences never provided Bayesian evidence in the wrong direction. Suppose you say “Hey, Eliezer, what were you doing last night?” and I reply “Staying at home doing the usual things I do before going to bed, why?” If you have a good mental picture of what I would lie about, you have now definitely learned that I was not out watching a movie, because that is not something I would lie about. A very large number of possibilities have been ruled out, and most of your remaining probability mass should now be on me having stayed home last night. You know that I wasn’t on a secret date with somebody who doesn’t want it known we’re dating, because you can ask me that hypothetical and I’ll say, “Sure, I’d happily hide that fact, but that isn’t enough to force me to lie. I would just say ‘Sorry, I can’t tell you where I was last night,’ instead of lying.”
You have not however gained any Bayesian evidence against my hiding a fugitive marijuana seller from the Feds, where somebody’s life or freedom is at stake and it’s vital to conceal that a secret even exists in the first place. Ideally we’d have common knowledge of that, and hopefully we’d agree that it was fine to lie in that case to a friend who asks a casual-seeming question.
Let’s be clear, although this is a kind of softening of deception, it’s still deception. Even if somebody has extensively discussed your code of honesty with you, they aren’t logically omniscient and won’t explicitly have the possibility in mind every time. That’s why we should go on holding ourselves to the standard of, “Would I defend this lie even if the person I was defending it to had never heard of meta-honesty?”
“Eliezer,” you say, “if you had a temporary schizophrenic breakdown and robbed a bank and this news hadn’t become public, would you lie to keep it from becoming public?”
And this would cause me to stop and think and agonize for a bit (which itself tells you something about me, that my answer is not instantly No or Yes). I do have important work to do which should not be trashed without strong reason, and this hypothetical situation would not have involved a great deliberate betrayal on my part; but it is also the sort of thing that you could reasonably argue an unusually honest person ought not to lie about, where lies do not in general serve the social good.
I think in the end I might reply something like “I wouldn’t lie freely and would probably try to use at least technical truth or Glomarize, but in the end I might conceal that event rather than letting my work be trashed for no reason. I think I’d understand if somebody else had done likewise, if I thought they were doing good work in the first place. Except that obviously I’d need to tell various people who are engaged in positive-sum trades with me, where it’s a directly important issue to them whether I can be trusted never to have mental breakdowns, and remove myself from certain positions of trust. And if it happened twice I’d be more likely to give up. If it got to the point where people were openly asking questions I don’t imagine myself as trying to continue a lie. I also want to caveat that I’m describing my ethical views, what I think is right in this situation, and obviously enough pressure can make people violate their own ethics and it’s not always predictable how much pressure it takes, though I generally consider myself fairly strong in that regard. But if this had actually happened I would have spent a lot more time thinking about it than the two minutes I spent writing this paragraph.” And this would help give you an accurate picture of the sort of person that I am in general, and what I take into account in considering exceptions.
Insofar as you are practicing a mental discipline in being meta-honest, the discipline is to be explicitly aware of every time you say something false, and to ask yourself, “Would I be okay publicly saying, if somebody asked me the hypothetical, that this is a situation where a person ought to lie?”
I still worry that this is not the thing that people need to do to establish their inner pact with truth. Maybe you could pick some friends to whom you just never tell any kind of literal falsehood, in the process of becoming initially aware of how many false things you were just saying all the time… but I don’t actually know if that works either. Maybe that’s like trying to stop smoking cigarettes on odd-numbered days. It’d be something to notice if the experimental answer is “In reality, meta-honesty turns out not to work for practicing the respect of truth.”
Meta-honesty should be for people who are comfortable, not with absolute honesty, but with not trying to appear any more honest than they are. This itself is not the ordinary equilibrium, and if you want to do things the standard human way and not forsake a well-tested and somewhat enforced social equilibrium in pursuit of a bright-eyed novel idealistic agenda, then you should not declare yourself meta-honest, or should let somebody else try it first.
7: Consistent object-level glomarization in meta-level honest responses.
Glomarization can be workable when restricted to special cases, such as only questions about nuclear weapons and submarines. Meta-honesty is such a special case and, if we’re doing this, we should all Glomarize it accordingly. In particular meta-questions are not to be used to extract object-level data, and we should all respect that in our questions, and consistently Glomarize about it in our answers, including some random times when Glomarization seems silly.
Some key responses that need to be standard:
“That question sounds too object-level.”
“I think you’re doing meta-honesty wrong.”
“I think I’m supposed to Glomarize that sort of answer in general.”
“I should answer a more abstract version of that.”
“I worry that some of my counterfactual selves are not in a mutually beneficial situation in this discussion.”
And if you clearly say that you “irrevocably worry” about any of these things, it means the meta-honest conversation has crashed; the other person is not supposed to keep pressing you, and if they do, you can lie. Ideally, this is something you should consistently do in any case where a substantial measure of your counterfactual selves as the other person might imagine them would be feeling pressured to the point of maybe meta-lying. That is, you should not only say “irrevocably worry” in cases where you actually have something to conceal, you should say it in cases where the discussion would be pressuring somebody who did have something to conceal and this seems high-enough-probability to you or to your model of the person talking to you.
For example: “Eliezer, would you lie about having robbed a bank?”
I consider whether this sounds like an attempt to extract object-level information from some of my counterfactual selves, and conclude that you probably place very little probability on my having actually robbed a bank. I reply, “Either it is the case that I did rob a bank and I think it is okay to lie about that, or alternatively, my reply is as follows: I wouldn’t ordinarily rob a bank. It seems to me that you are postulating some extraordinary circumstance which has driven me to rob a bank, and you need to tell me more about this extraordinary circumstance before I tell you whether I’d lie about it. Or you’re postulating a counterfactual version of me that’s fallen far enough off the ethical rails that he’d probably stop being honest too.”
Some additional statements that ought to be taken as praiseworthy:
“I only feel free to have a frank discussion about that if everyone in the room has agreed to abide by the meta-honesty code.”
“I notice that I’m feeling interrogated, and should not try to give a code-abiding answer to that right now.”
“It is either the case that this actually happened and I think it is okay to lie about it, or that my current quick guess is that I wouldn’t lie in that case.”
“Hold on, let me either generate a random number or pretend to generate a random number, such that if I’m actually generating a random number and it comes up as 0, I will try to seem more evasive than usual in this conversation even if I have nothing to actually hide.”
This is not supposed to be a clever way to extract information from people and you should shut down any attempt to use it that way.
“Harry,” says HPMOR!Dumbledore, “I ask you under the code of meta-honesty (which we have just anachronistically acquired): Would you lie about having robbed the Gringotts Bank?”
Harry thinks, Maybe this is about the Azkaban breakout, and says, “Do you in fact suspect me of having robbed a bank?”
“I think that if I suspected you of having robbed a bank,” says Dumbledore, “and I did not wish you to know that, I would not ask you if you had robbed a bank. Why do you ask?”
“Because the circumstances under which you’re invoking meta-honesty have something to do with how I answer,” says Harry (who has suddenly acquired a view on this subject that some might consider implausibly detailed). “In particular, I think I react differently depending on whether this is basically about you trying to construct a new mutually beneficial arrangement with the person you think I am, or if you’re in an adversarial situation with respect to some of my counterfactual selves (where the term ‘counterfactual’ is standardly taken to include the actual world as one that is counterfactually conditioned on being like itself). Also I think it might be a good idea generally that the first time you try to have an important meta-honest conversation with someone, you first spend some time having a meta-meta-honest conversation to make sure you’re on the same page about meta-honesty.”
“I am not sure I understood all that,” said Dumbledore. “Do you mean that if you think we have become enemies, you might meta-lie to me about when you would lie?”
Harry shook his head. “No,” said Harry, “because then if we weren’t enemies, you would still never really be able to trust what I say even assuming me to abide by my code of honesty. You would have to worry that maybe I secretly thought you were an enemy and didn’t tell you. But the fact that I’m meta-honest shouldn’t be something that you can use against me to figure out whether I… sneaked into the girl’s dorm and wrote in somebody’s diary, say. So if I’m in that situation I’ve got to protect my counterfactual selves and Glomarize harder. Whereas if this is more of a situation where you want to know if we can go to Mordor together, then I’d feel more open and try to give you a fuller picture of me with more detail and not worry as much about Glomarizing the specific questions you ask.”
“I suspect,” Dumbledore said gravely, “that those who try to be honest at all will always be at something of a disadvantage relative to the most ready liars, at least if they’ve robbed Gringotts. But yes, Harry, I am afraid that this is more of a situation where I am… concerned… about some of your counterfactual selves. But then why would you answer at all, in such a case?”
“Because sometimes people are honest and have good intentions,” answered Harry, “and I think that if in general they can have an accurate picture of the other person’s honesty, everybody is on net a bit better off. Even if I had robbed a bank, for example, you and I would both still not want anything bad to happen to Britain. And some of my counterfactual selves are innocent, and they’re not better off if you think I’m more dishonest than I am.”
“Then I ask again,” said Dumbledore, “under the code of meta-honesty, whether you would lie about having robbed a bank.”
“Then my answer is that I wouldn’t ordinarily rob a bank,” Harry said, “and I’d feel even worse about lying about having robbed a bank, than having robbed a bank. And I’d know that if I robbed a bank I’d also have to lie about it. So whatever weird reason made me rob the bank, it’d have to be weird enough that I was willing to rob the bank and willing to lie about it, which would take a pretty extreme situation. Where it should be clear that I’m not trying to answer about having specifically robbed a bank, I’m trying to give you a general picture of what sort of person I am.”
“What if you had been blackmailed into robbing the bank?” inquired Dumbledore. “Or what if things crept up on you bit by bit, so that in the end you found yourself in an absurd situation you’d never intended to enter?”
Harry shrugged helplessly. “Either it’s the case that I did end up in a weird situation and I don’t want to let you know about that, or alternatively, I feel like you’re describing a very broad range of possibilities that I’d have to think about more, because I haven’t yet ended up in that kind of situation and I’m not quite sure how I’d behave… I think I’d have in mind that just telling the Headmaster the truth can prevent big problems from blowing up any further, but there’d be cases extreme enough that I wouldn’t do that either… I mean, the basic answer is, yes, there’s things that would make me lie right to your face, but, I wouldn’t do that just for having stolen candy from the kitchen, I don’t think. I’d just be like ‘I consistently Glomarize when people ask me if I’ve stolen candy from the kitchen.’”
“Would you lie to protect a fugitive from the law who you believed was being unjustly pursued?” said Dumbledore. “Say that Sirius Black was innocent after all, as the Quibbler would have it; would you hide him and then lie to Minerva about what you were doing on the night you hid him?”
“Yes,” Harry said unhesitatingly. “That’s something I think is clearly right and the sort of thing I would ordinarily do.”
“What if you thought that somebody was being unjustly pursued by the law?” said Dumbledore. “Is that worth lying for?”
Harry frowned. “I think I maybe shouldn’t answer questions about what in particular I think ought not to be illegal,” Harry said, “but if the basic question is whether I’d lie to my friends to keep somebody who’d only used marijuana out of Azkaban, then I would. I don’t think my friends should expect otherwise; they should want me to do that.”
“We are not being meta-honest any more, only honest,” said Dumbledore. “Were you involved in the breakout from Azkaban?”
Harry stared at Dumbledore a moment, and said, “I think I have to consistently not confirm or deny that, but my counterfactual and possibly actual self who wasn’t involved would really like to know at this point what the heck you think might’ve happened.”
If somehow the Gestapo has heard of meta-honesty and they ask you if you would lie to hide Jews in your attic, they should be expecting a law-abiding meta-honest Nazi citizen to Glomarize, abstract the question in such a way as to not reveal the object-level information that might be sought, or say “I irrevocably worry that some of my counterfactual selves are not in a positive-sum situation.” That, I now proclaim, ought to be done by Nazis and resistance fighters alike, when you are faced with a question that might be meant to reveal object-level information about what happened.
“Eliezer,” says the hypothetical Gestapo officer who has somehow heard about my meta-honesty code, “it happens that I’m a person who’s heard of meta-honesty. Now, are you the sort of person who would lie about having Jews hidden in your attic?”
This hypothetical Gestapo officer has a gun. Most people asking you meta-honest questions won’t have a gun. In fact I bet this will literally never happen until the end of the world. Let’s suppose he has a gun anyway.
“I am the following sort of person,” I reply. “If I was hiding the Führer in my attic to protect him from Jewish assassins, I’d lie about that to the assassins. It’s clear you know about my code of meta-honesty, so you should understand that is a very innocent thing to say. But these circumstances and the exact counterfactual you are asking make me nervous, so I’m afraid to utter the words I think you may be looking for, namely the admission that if I were the kind of person who’d hide Jews in his attic then I’d be the kind of person who would lie to protect them. Can I say that I believe that in respect to your question as you mean it, I think that is no more and no less true of me than it is true of you?”
“My, you are fast on your verbal feet,” says the Gestapo officer. “If somebody were less fast on their verbal feet, would you tell them that it was acceptable for a meta-honest person to just meta-lie to the Jewish assassins in order to hide the Führer?”
“If they didn’t feel that their counterfactual loyal Nazi self would think that their counterfactual disloyal self was being pressured and clearly state that fact irrevocably,” I say, “I’d say that, just like their counterfactual loyal self, they should make some effort to reveal the general limits of their honesty without betraying any of their counterfactual selves, but say they irrevocably couldn’t handle the conversation as soon as they thought their alternate loyal self would think their alternate’s counterfactual disloyal self couldn’t handle the conversation. It’s not as if the Jewish assassins would be fooled if they said otherwise. If the Jewish assassins do continue past that point, which is blatantly forbidden and everyone should know that, they may lie.”
“I see,” says the Gestapo officer. “If you are telling me the truth, I think I have grasped the extent of what you claim to be honest about.” He turns to his subordinates. “Go search his attic.”
“Now I’m curious,” I say. “What would you have done if I’d sworn to you that I was an absolutely loyal German citizen, and that my character was such that I would certainly never lie about having Jews in my attic even if I were the sort of disloyal citizen who had Jews in his attic in the first place?”
“I would have detailed twice as many men to search your house,” says the Gestapo officer, “and had you detained, for that is not the response I would expect from an honest Nazi who knew how meta-honesty was supposed to work. Now I ask you meta-meta-honestly, why haven’t you said that you are irrevocably worried that I am abusing the code? Obviously I put substantial probability on you being a traitor, meaning I am deliberately pressuring you into a meta-conversation and trying to use your code of honesty against those counterfactual selves. Why didn’t you just shut me down?”
“Because you do have a gun, sir,” I say. “I agree that it’s what the rules called for me to say, but I thought over the situation and decided that I was comfortable with saying that in general this was a sort of situation where that rule could be bent so as for me to not end up being shot—and I tell you meta-meta-honestly that I do believe the situation has to be that extreme in order for that rule to even be bent.”
Really the principle is that it is not okay to meta-ask what the Gestapo officer is meta-asking here. This kind of detailed-edge-case-checking conversation might be appropriate for shoring up the edges of an interaction intended to be mutually beneficial, but absolutely not for storming in looking for Jews in the attic of a person who in your mind has a lot of measure on having something to hide.
But I do want to have trustworthy foundations somewhere.
And I think it’s reasonable to expect that over the course of a human lifetime you will literally never end up in a situation where a Gestapo officer who has read this essay is pointing a gun at you and asking overly-object-level-probing meta-honesty questions, and will shoot you if you try to glomarize but will believe you if you lie outright, given that we all know that everyone, innocent or guilty, is supposed to glomarize in situations like that. Up until today I don’t think I’ve ever seen any questions like this being asked in real life at all, even hanging out with a number of people who are heavily into recursion.
So if one is declaring the meta-honesty code at all, then one shouldn’t meta-lie, period; I think the rules have been set up to allow that to be absolute. I don’t want you to have to worry that maybe I think I’m being pressured, or maybe I thought you meta-asked the wrong thing, so now I think it’s okay to meta-lie even though I haven’t given any outward sign of that. To that end, I am willing to sacrifice the very tiny fraction of the measure of my future selves who will end up facing an extremely weird Gestapo officer. To me, for now, there doesn’t seem to be any real-life circumstance where you should lie in response to a meta-honesty question—rather than consistently glomarize that kind of question, consistently abstract that kind of question, consistently answer in an analogy rather than the original question, or consistently say “I believe some counterfactual versions of me would say that cuts too close to the object level.” (It being a standard convention that counterfactuals may include the actual.)
I also think we can reasonably expect that from now until the end of the world, honest people should literally absolutely never need to evade or mislead at all on the meta-meta-level, like if somebody asks if you feel like the meta-level conversation has abided by the rules. (And just like meta-honesty doesn’t excuse object-level dishonesty, by saying that meta-meta-honesty seems like it could be everywhere open and total, I don’t mean to excuse meta-level lies. We should all still regard meta-lies as extremely bad and a Code Violation and You Cannot Be Trusted Anymore.)
If there’s a meta-honest discussion about someone’s code of honesty, and a discussion of what they think about the current meta-meta conditions of how the meta-honesty code is being used, and it sounds to you like they think things are fine… then things should be fine, period. If you ask, do they think that any pressure strong enough to potentially shake their meta-honesty is potentially around, do they think that the overall situation here would have treated any of their plausible counterfactual selves in a negative-sum way, and they say no it’s all fine—then that is supposed to be absolute under the code. That ought to establish a foundation that’s as reliable as the person’s claim to be meta-honest at all.
If you go through all that and lie and meta-lie and meta-meta-lie after saying you wouldn’t, you’ve lied under some of the kindest environments that were ever set up on this Earth to let people not lie, among people who were trying to build trust in that code so we could all use it together. You are being a genuinely awful person as I’d judge that, and I may advocate for severe social sanctions to apply.
Assuming this ends up being a thing, that is. I haven’t run it past many people yet and this is the first public discussion. Maybe there’s some giant hole in it I haven’t spotted.
If anybody ever runs into an actual real circumstance where it seems to them that meta-honesty as they tried to use it was giving the essay-reading Gestapo too much power or too much information, maybe because they weren’t fast enough on their verbal feet, please email me about it so I can consider whether to modify or backtrack on this whole idea. I will try to protect your anonymity under all circumstances up to and including the end of the world unless you say otherwise. The previous sentence is not the sort of thing I would lie about.
8: Counterargument: Maybe meta-honesty is too subtle.
I worry that the notion of meta-honesty is too complicated and subtle. In that it has subtleties in it, at all.
This concept is certainly too subtle for Twitter. Maybe it’s too subtle for us too.
Maybe “meta-honesty” is just too complicated a concept to be able to make it be part of a culture’s Law, compared to the standard-twistiness-compliant performance of saying “Always be honest!” and waiting for the weight of duty to crush down people’s hands, or saying “Never say anything false!” and just-not-discussing all the exceptions that people think obviously don’t count.
(But of course that system also has disadvantages, like people having different automatic norms about what they think are obvious exceptions.)
I’ve started to worry more, recently, about which cognitive skills have other cognitive skills as prerequisites. One of the reasons I hesitated to publish Inadequate Equilibria (before certain persons yanked it out of my drafts folder and published it anyway) was that I worried that maybe the book’s ideas were useless or harmful without mastery of other skills. Like, maybe you need to have developed a skill for demotivating cognition, and until then you can’t reason about charged political issues or your startup idea well enough for complicated thoughts about Nash equilibria to do more good than harm. Or maybe unless you already know a bunch of microeconomics, you just stare at society and see a diffuse mass of phenomena that might or might not be bad equilibria, and you can’t even guess non-wildly in a way that lets you get started on learning.
Maybe meta-honesty contains enough meta, in that it has meta at all, that it just blows up in most people’s heads. Sure, people in our little subcommunity tend to max out the Cognitive Reflection Test and everything that correlates with it. But compared to scoring 3 out of 3 on the CRT, the concept of meta-honesty is probably harder to live in real life—stopping and asking yourself “Would I be willing to publicly defend this as a situation in which unusually honest people should lie, if somebody posed it as a hypothetical?” Maybe that just gets turned into “It’s permissible to lie so long as you’d be honest about whether you’d tell that lie if anyone asks you that exact question and remembers to say they’re invoking the meta-honesty code,” because people can’t process the meta-part correctly. Or maybe there’s some subtle nonobvious skill that a few people have practiced extensively and can do very easily, and that most people haven’t practiced extensively and can’t do that easily, and this subskill is required to think about meta-honesty without blowing up. Or maybe I just get an email saying “I tried to be meta-honest and it didn’t work because my verbal SAT score was not high enough, you need to retract this.”
If so, I’m not sure there’s much that could be done about it, besides me declaring that Meta-Honesty had turned out to be a terrible idea as a social innovation and nobody should try that anymore. And then that might not undo the damage to the law-as-absolute performance that makes something be part of the Law.
But I’d outright lie to the Gestapo about Jews in my attic. And even to friends, I can’t consistently Glomarize about every point in my life where one of my counterfactual selves could possibly have been doing that. So I can’t actually promise to be a wizard, and I want there to exist firm foundations somewhere.
Questions? Comments?
- Elements of Rationalist Discourse by 12 Feb 2023 7:58 UTC; 223 points) (
- Some thoughts on vegetarianism and veganism by 14 Feb 2022 2:34 UTC; 192 points) (EA Forum;
- The Onion Test for Personal and Institutional Honesty by 27 Sep 2022 15:26 UTC; 162 points) (
- 2018 Review: Voting Results! by 24 Jan 2020 2:00 UTC; 135 points) (
- Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think by 27 Dec 2019 5:09 UTC; 127 points) (
- The LessWrong 2019 Review by 2 Dec 2020 11:21 UTC; 113 points) (
- Rationality Exercises Prize of September 2019 ($1,000) by 11 Sep 2019 0:19 UTC; 89 points) (
- I applied for a MIRI job in 2020. Here’s what happened next. by 15 Jun 2022 19:37 UTC; 82 points) (
- Privacy and Manipulation by 5 Dec 2021 0:39 UTC; 78 points) (
- Elements of Rationalist Discourse by 14 Feb 2023 3:39 UTC; 68 points) (EA Forum;
- Covid 10/8: October Surprise by 8 Oct 2020 13:20 UTC; 66 points) (
- Simplified Poker Conclusions by 9 Jun 2018 21:50 UTC; 64 points) (
- The Onion Test for Personal and Institutional Honesty by 27 Sep 2022 15:26 UTC; 57 points) (EA Forum;
- On Negative Feedback and Simulacra by 3 May 2020 17:00 UTC; 47 points) (
- Notes on Honesty by 28 Oct 2020 0:54 UTC; 46 points) (
- If Clarity Seems Like Death to Them by 30 Dec 2023 17:40 UTC; 46 points) (
- Some thoughts on vegetarianism and veganism by 14 Feb 2022 2:40 UTC; 37 points) (
- 25 May 2022 14:33 UTC; 29 points) 's comment on EA can sound less weird, if we want it to by (EA Forum;
- An attempt to list out my core values and virtues by 9 Jun 2019 20:02 UTC; 26 points) (
- Building Trust in Strategic Settings by 28 Dec 2023 22:12 UTC; 24 points) (
- The Incoherence of Honesty by 8 Jun 2018 2:28 UTC; 20 points) (
- 30 Jun 2019 19:02 UTC; 20 points) 's comment on Causal Reality vs Social Reality by (
- Consistent Glomarization should be feasible by 4 May 2020 10:06 UTC; 17 points) (
- 21 Jan 2023 0:48 UTC; 16 points) 's comment on FLI FAQ on the rejected grant proposal controversy by (EA Forum;
- 15 Nov 2023 23:15 UTC; 15 points) 's comment on Glomarization FAQ by (
- 28 Oct 2023 17:18 UTC; 14 points) 's comment on Truthseeking, EA, Simulacra levels, and other stuff by (
- Honesty, Openness, Trustworthiness, and Secrets by 6 Mar 2023 9:03 UTC; 13 points) (
- 1 Apr 2022 9:59 UTC; 12 points) 's comment on Replacing Karma with Good Heart Tokens (Worth $1!) by (
- What’s the name for that plausible deniability thing? by 24 Jun 2020 18:42 UTC; 11 points) (
- 6 Jun 2019 20:32 UTC; 10 points) 's comment on Site Guide: Personal Blogposts vs Frontpage Posts by (
- 14 Sep 2019 0:28 UTC; 10 points) 's comment on A Critique of Functional Decision Theory by (
- 10 Jan 2020 0:58 UTC; 9 points) 's comment on Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think by (
- 27 Mar 2024 3:12 UTC; 9 points) 's comment on My Interview With Cade Metz on His Reporting About Slate Star Codex by (
- A Hill of Validity in Defense of Meaning by 15 Jul 2023 17:57 UTC; 8 points) (
- ACX Montreal Meetup June 10th 2023 by 1 Jun 2023 4:18 UTC; 8 points) (
- 27 Dec 2019 14:18 UTC; 7 points) 's comment on Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think by (
- 10 Oct 2022 19:27 UTC; 7 points) 's comment on The Onion Test for Personal and Institutional Honesty by (
- 9 Dec 2022 1:51 UTC; 7 points) 's comment on Rules for Epistemic Warfare? by (
- 1 Apr 2021 7:42 UTC; 6 points) 's comment on Dark Side Epistemology by (
- 9 Oct 2022 20:08 UTC; 6 points) 's comment on The Onion Test for Personal and Institutional Honesty by (
- 5 Oct 2018 23:00 UTC; 5 points) 's comment on Being a Robust Agent by (
- 23 Nov 2023 3:04 UTC; 4 points) 's comment on Dialogue on the Claim: “OpenAI’s Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI” by (
- 8 Oct 2022 19:45 UTC; 4 points) 's comment on The Onion Test for Personal and Institutional Honesty by (
- 8 Mar 2020 23:23 UTC; 3 points) 's comment on Winning vs Truth – Infohazard Trade-Offs by (
- 8 Nov 2021 1:47 UTC; 3 points) 's comment on Money Stuff by (
- 13 Apr 2024 13:30 UTC; 2 points) 's comment on Consequentialism is a compass, not a judge by (
- 27 Dec 2020 15:12 UTC; 0 points) 's comment on Consequentialism Need Not Be Nearsighted by (
- Rationalist position towards lying? by 12 Apr 2023 1:21 UTC; -2 points) (
This was the post that got the concept of a “robust agent” to click into place in my mind. It also more concretely got me thinking about what it means to be honest, and how to go about that in a world that sometimes punished honesty.
Since then, I’ve thought a lot about meta-honesty as well as meta-trust (in contexts that are less about truth/falsehood). I have some half-finished posts on the topic I hope to share at some point.
This also had some concrete impacts on how I think about the LessWrong team’s integrity, which made it’s way into several conversations that (I’d guess?) made their way into habryka’s post on Integrity, as well as my Robust Agency for People and Organizations.
One of my favorite posts, that encouraged me to rethink and redesign my honesty policy.
Used as research for my EA/rationality novel, I found this interesting and useful (albeit very meta and thus sometimes hard to follow).
This has definitely among the top posts that has stuck with me. My instincts are very strongly towards wanting to always be maximally honest, but of course that’s not perfectly practical. This post works to recover a principled relationship to truth-telling and honesty even in the face of real-world necessity to sometimes not maximally promote truth.