Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 2 Mar 2020 16:34 UTC
LW: 6 AF: 3
AF
Searching for equilibria can be infohazardous. You might not like the one you find first, but you might end up sticking with it (or worse, deviating from it and being punished). This is because which equilbrium gets played by other people depends (causally or, in some cases, acausally) not just on what equilibrium you play but even on which equilibria you think about. For reasons having to do with schelling points. A strategy that sometimes works to avoid these hazards is to impose constraints on which equilibria you think about, or at any rate to perform a search through equilibria-space that is guided in some manner so as to be unlikely to find equilibria you won’t like. For example, here is one such strategy: Start with a proposal that is great for you and would make you very happy. Then, think of the ways in which this proposal is unlikely to be accepted by other people, and modify it slightly to make it more acceptable to them while keeping it pretty good for you. Repeat until you get something they’ll probably accept.
- Dagon 2 Mar 2020 17:12 UTC
  4 points
  Parent
  I’m not sure I follow the logic. When you say “searching for equilibria”, do you mean “internally predicting likelihood of points and durations of an equilibrium (as most of what we worry about aren’t stable)? Or do you mean the process of application of forces and observation of counter forces in which the system is “finding it’s level”? Or do you mean “discussion about possible equilibria, where that discussion is in fact a force that affects the system”?
  Only the third seems to fit your description, and I think that’s already covered by standard infohazard writings—the risk that you’ll teach others something that can be used against you.
  - Daniel Kokotajlo 2 Mar 2020 17:48 UTC
    4 points
    Parent
    I meant the third, and I agree it’s not a particularly new idea, though I’ve never seen it said this succinctly or specifically. (For example, it’s distinct from “the risk that you’ll teach others something that can be used against you,” except maybe in the broadest sense.)
    - Dagon 2 Mar 2020 18:24 UTC
      2 points
      Parent
      Interesting. I’d like to explore the distinction between “risk of converging on a dis-preferred social equilibrium” (which I’d frame as “making others aware that this equilibrium is feasible”) and other kinds of revealing information which others use to act in ways you don’t like. I don’t see much difference.
      The more obvious cases (“here are plans to a gun that I’m especially vulnerable to”) don’t get used much unless you have explicit enemies, while the more subtle ones (“I can imagine living in a world where people judge you for scratching your nose with your left hand”) require less intentionality of harm directed at you. But it’s the same mechanism and info-risk.
      - Daniel Kokotajlo 2 Mar 2020 19:57 UTC
        2 points
        Parent
        For one thing, the equilibrium might not actually be feasible, but making others aware that you have thought about it might nevertheless have harmful effects (e.g. they might mistakenly think that it is, or they might correctly realize something in the vicinity is.) For another, “teach others something that can be used against you” while technically describing the sort of thing I’m talking about, tends to conjure up a very different image in the mind of the reader—an image more like your gun plans example.
        I agree there is not a sharp distinction between these, probably. (I don’t know, didn’t think about it.) I wrote this shortform because, well, I guess I thought of this as a somewhat new idea—I thought of most infohazards talk as being focused on other kinds of examples. Thank you for telling me otherwise!
        Dagon 2 Mar 2020 20:57 UTC
        4 points
        Parent
        (oops. I now realize this probably come across wrong). Sorry! I didn’t intend to be telling you things, nor did I mean to imply that pointing out more subtle variants of known info-hazards was useless. I really appreciate the topic, and I’m happy to have exactly as much text as we have in exploring non-trivial application of the infohazard concept, and helping identify whether further categorization is helpful (I’m not convinced, but I probably don’t have to be).