Cleo Nardo comments on Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Cleo Nardo 21 Dec 2023 9:22 UTC
2 points
0
“Don’t share information that’s exfohazardous on others’ models, even if you disagree with those models, except if your knowledge of it isn’t exclusively caused by other alignment researchers telling you of it.”
So if Alice tells me about her alignment research, and Bob thinks that Alice’s alignment research is exfohazardous, then I can’t tell people about Alice’s alignment research?
Unless I’ve misunderstood you, that’s a terrible policy.
Why am I deferring to Bob, who is completely unrelated? Why should I not using my best judgement, which includes the consideration that Bob is worried? What does this look like in practice, given someone people think everything under the sun is exfohazardous?
Of course, if someone tells me some information and asks me not to share it then I won’t — but that’s not a special property of AI xrisk.
- Tamsin Leake 21 Dec 2023 10:32 UTC
  6 points
  2
  Parent
  Pretty sure that’s what the “telling you of it” part fixes. Alice is the person who told you of Alice’s hazards, so your knowledge is exclusively caused by Alice, and Alice is the person whose model dictates whether you can share them.
  - Cleo Nardo 21 Dec 2023 14:21 UTC
    6 points
    4
    Parent
    yep, if that’s OP’s suggestion then I endorse the policy. (But I think it’d be covered by the more general policy of “Don’t share information someone tells you if they wouldn’t want you to”.) But my impression is that OP is suggesting the stronger policy I described?
    - Thane Ruthenis 21 Dec 2023 14:23 UTC
      4 points
      0
      Parent
      No, Tamsin’s interpretation is correct.
      Natural language is prone to ambiguous interpretations, and I’d tried to rephrase the summary a few times to avoid them. Didn’t spot that one.
      - Cleo Nardo 21 Dec 2023 14:38 UTC
        4 points
        0
        Parent
        Okay, mea culpa. You can state the policy clearly like this:
        
        ”Suppose that, if you hadn’t been told $X$ by someone who thinks $X$ is exfohazardous, then you wouldn’t have known $X$ before time $t$ . Then you are obligated to not tell anyone $X$ before time $t$ .”