Thanks for reading closely enough to have detailed responses and trying to correct the record according to your memory. Appreciate that you’re explicitly not trying to disincentivize saying negative things about one’s former employee (a family member of mine was worried about my writing this post on the basis that it would “burn bridges”).
A couple general points:
These events happened years ago and no one’s memory is perfect (although our culture has propaganda saying memories are less reliable than they in fact are). E.g. I mis-stated a fact about Maia’s death, that Maia had been on Ziz’s boat, based on filling in the detail from the other details and impressions I had.
I can’t know what someone “really means”, I can know what they say and what the most reasonable apparent interpretations are. I could have asked more clarifying questions at the time, but that felt expensive due to the stressful dynamics the post describes.
In terms of more specific points:
(And I have a decent chunk of probability mass that Jessica would clarify that she’s not accusing me of intentional coercion.) From my own perspective, she was misreading my own frame and feeling pressured into it despite significant efforts on my part to ameliorate the pressure. I happily solicit advice for what to do better next time, but do not consider my comport to have been a mistake.
I’m not accusing you of intentional coercion, I think this sort of problem could result as a side effect of e.g. mental processes trying to play along with coalitions while not adequately modeling effects on others. Some of the reasons I’m saying I was coerced are (a) Anna discouraging researchers from talking with Michael, (b) the remote possibility of assassination, (c) the sort of economic coercion that would be expected on priors at most corporations (even if MIRI is different). I think my threat model was pretty wrong at the time which made me more afraid than I actually had to be (due to conservatism); this is in an important sense irrational (and I’ve tried pretty hard to get better at modeling threats realistically since then), although in a way that would be expected to be common in normal college graduates. Given that I was criticizing MIRI’s ideology more than other researchers, my guess is that I was relatively un-coerced by the frame, although it’s in principle possible that I simply disagreed more.
I don’t recall ever “talking about hellscapes” per se. I recall mentioning them in passing, rarely. In my recollection, that mainly happened in response to someone else broaching the topic of fates worse than death. (Maybe there were other occasional throwaway references? But I don’t recall them.)
I’m not semantically distinguishing “mentioning” from “talking about”. I don’t recall asking about fates worse than death when you mentioned them and drew a corresponding graph (showing ~0 utility for low levels of alignment, negative utility for high but not excellent levels of alignment, and positive utility for excellent levels of alignment).
According to my best recollection of the conversation that I think Jessica is referring to, she was arguing that AGI will not arrive in our own lifetimes, and seemed unresponsive to my attempts to argue that a confident claim of long timelines requires positive knowledge, at which point I exasperatedly remarked that for all we knew, the allegedly missing AGI insights had already been not only had, but published in the literature, and all that remains is someone figuring out how to assemble them.
Edited to make it clear you weren’t trying to assign high probability to this proposition. What you said seems more reasonable given this, although given you were also talking about AI coming in the next 20 years I hope you can see why I thought this reflected your belief.
Here Jessica seems to be implying that, not only did I positively claim that the pieces of AGI were already out there in the literature, but also that I had personally identified them? I deny that, and I’m not sure what claim I made that Jessica misunderstood in that way.
Edited to make it clear you didn’t mean this. The reason I drew this as a Gricean implicature is that figuring out how to make an AGI wouldn’t provide evidence that the pieces to make AGI are already out there, unless such an AGI design would work if scaled up / iteratively improved in ways that don’t require advanced theory / etc.
The sequence of events as I recall them was: Various researchers wanted to do some closed research. There was much discussion about how much information was private: Research results? Yes, if the project lead wants privacy. Research directions? Yes, if the project lead wants privacy. What about the participant list for each project? Can each project determine their own secrecy bounds individually, or is revealing who’s working with you defecting against (possibly-hypothetical) projects that don’t want to disclose who they’re working with? etc. etc. I recall at least one convo with a bunch of researchers where, in efforts to get everyone to stop circling privacy questions like moths to a flame and get back to the object level research, I said something to the effect of “come to me if you’re having trouble”.
Even if the motive came from other researchers I specifically remember hearing about the policy at a meeting in a top-down fashion. I thought the “don’t ask each other about research” policy was bad enough that I complained about it and it might have been changed. It seems that not everyone remembers this policy (although Eliezer in a recent conversation didn’t disagree about this being the policy at some point), but I must have been interpreting something this way because I remember contesting it.
According to me, I was not trying to say “you shouldn’t talk about ways you believe others to be acting in bad faith”. I was trying to say “I think y’all are usually mistaken when you’re accusing certain types of other people of acting in bad faith”, plus “accusing people of acting in bad faith [in confrontational and adversarial ways, instead of gently clarifying and confirming first] runs a risk of being self-fulfilling, and also burns a commons, and I’m annoyed by the burned commons”. I think those people are wrong and having negative externalities, not that they’re bad for reporting what they believe.
I hope you can see why I interpreted the post as making a pragmatic argument, not simply an epistemic argument, against saying others are acting in bad faith:
When criticism turns to attacking the intentions of others, I perceive that to be burning the commons. Communities often have to deal with actors that in fact have ill intentions, and in that case it’s often worth the damage to prevent an even greater exploitation by malicious actors. But damage is damage in either case, and I suspect that young communities are prone to destroying this particular commons based on false premises.
In the context of 2017, I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about (which implies our main disagreement wasn’t about how common bad faith was).
I don’t actually know what conversation this is referring to. I recall a separate instance, not involving Jessica, of a non-researcher spending lots of time in the office hanging out and talking with one of our researchers, and me pulling the researcher aside and asking whether they reflectively endorsed having those conversations or whether they kept getting dragged into them and then found themselves unable to politely leave. (In that case, the researcher said they reflectively endorsed them, and thereafter I left them alone.)
Edited to say you don’t recall this. I didn’t hear this from you, I head it secondhand perhaps from Michael Vassar, so I don’t at this point have strong reason to think you said this.
There’s no law saying that, when someone’s making a mistake, there’s some way to explain it to them such that suddenly it’s fixed. I think existing capabilities orgs are making mistakes (at the very least, in publishing capabilities advances (though credit where credit is due, various labs are doing better at keeping their cutting-edge results private, at least until somebody else replicates or nearly-replicates them, than they used to be (though to be clear I think we have a long way to go before I stop saying that I believe I see a big mistake))), and deny the implicit inference from “you can’t quickly convince someone with words that they’re making a mistake” to “you must be using conflict theory”.
I agree that “speed at which you can convince someone” is relevant in a mistake theory. Edited to make this clear.
But, as I told Jessica at the time (IIRC), I expect folks at leading AGI labs to be much more sensitive to solutions to the alignment problem, despite the fact that I don’t think you can talk them into giving up public capabilities research in practice. (This might be what she misunderstood as me saying we’d have better luck “competing”? I don’t recall saying any such thing, but I do recall saying that we’d have better luck solving alignment first and persuading second.)
If I recall correctly you were at the time including some AGI capabilities research as part of alignment research (which makes a significant amount of theoretical sense, given that FAI has to pursue convergent instrumental goals). In this case developing an alignment solution before DeepMind develops AGI would be a form of competition. DeepMind people might be more interested in the alignment solution if it comes along with a capabilities boost (I’m not sure whether this consideration was discussed in the specific conversation I’m referring to, but it might have been considered in another conversation, which doesn’t mean it was in any way planned on).
That said, as I told Jessica at the time (IIRC), you can always just ask me whether I’m speaking as MIRI-the-organization or whether I’m speaking as Nate. Similarly, when I’m speaking as Nate-the-person, you can always just ask me about my honesty protocols.
Ok, this helps me disambiguate your honesty policy. If “employees may say things on the MIRI blog that would be very misleading under the assumption that this blog was not the output of MIRI playing politics and being PC and polite” is consistent with MIRI’s policies, it’s good for that to be generally known. In the case of the OpenAI blog post, the post is polite because it gives a misleadingly positive impression.
(a) Anna discouraging researchers from talking with Michael
...
...I specifically remember hearing about the policy at a meeting in a top-down fashion...it seems that not everyone remembers this policy...I must have been interpreting something this way because I remember contesting it.
...
...I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about...
Just a note on my own mental state, reading the above:
Given the rather large number of misinterpretations and misrememberings and confusions-of-meaning in this and the previous post, along with Jessica quite badly mischaracterizing what I said twice in a row in a comment thread above, my status on any Jessica-summary (as opposed to directly quoted words) is “that’s probably not what the other person meant, nor what others listening to that person would have interpreted that person to mean.”
By “probably” I literally mean strictly probably, i.e. a greater than 50% chance of misinterpretation, in part because the set of things-Jessica-is-choosing-to-summarize is skewed toward those she found unusually surprising or objectionable.
If I were in Jessica’s shoes, I would by this point be replacing statements like “I had a conversation with Anna Salamon where she said X” with “I had a conversation with Anna Salamon where she said things which I interpreted to mean X” as a matter of general policy, so as not to be misleading-in-expectation to readers.
Thanks for reading closely enough to have detailed responses and trying to correct the record according to your memory. Appreciate that you’re explicitly not trying to disincentivize saying negative things about one’s former employee (a family member of mine was worried about my writing this post on the basis that it would “burn bridges”).
A couple general points:
These events happened years ago and no one’s memory is perfect (although our culture has propaganda saying memories are less reliable than they in fact are). E.g. I mis-stated a fact about Maia’s death, that Maia had been on Ziz’s boat, based on filling in the detail from the other details and impressions I had.
I can’t know what someone “really means”, I can know what they say and what the most reasonable apparent interpretations are. I could have asked more clarifying questions at the time, but that felt expensive due to the stressful dynamics the post describes.
In terms of more specific points:
I’m not accusing you of intentional coercion, I think this sort of problem could result as a side effect of e.g. mental processes trying to play along with coalitions while not adequately modeling effects on others. Some of the reasons I’m saying I was coerced are (a) Anna discouraging researchers from talking with Michael, (b) the remote possibility of assassination, (c) the sort of economic coercion that would be expected on priors at most corporations (even if MIRI is different). I think my threat model was pretty wrong at the time which made me more afraid than I actually had to be (due to conservatism); this is in an important sense irrational (and I’ve tried pretty hard to get better at modeling threats realistically since then), although in a way that would be expected to be common in normal college graduates. Given that I was criticizing MIRI’s ideology more than other researchers, my guess is that I was relatively un-coerced by the frame, although it’s in principle possible that I simply disagreed more.
I’m not semantically distinguishing “mentioning” from “talking about”. I don’t recall asking about fates worse than death when you mentioned them and drew a corresponding graph (showing ~0 utility for low levels of alignment, negative utility for high but not excellent levels of alignment, and positive utility for excellent levels of alignment).
Edited to make it clear you weren’t trying to assign high probability to this proposition. What you said seems more reasonable given this, although given you were also talking about AI coming in the next 20 years I hope you can see why I thought this reflected your belief.
Edited to make it clear you didn’t mean this. The reason I drew this as a Gricean implicature is that figuring out how to make an AGI wouldn’t provide evidence that the pieces to make AGI are already out there, unless such an AGI design would work if scaled up / iteratively improved in ways that don’t require advanced theory / etc.
Even if the motive came from other researchers I specifically remember hearing about the policy at a meeting in a top-down fashion. I thought the “don’t ask each other about research” policy was bad enough that I complained about it and it might have been changed. It seems that not everyone remembers this policy (although Eliezer in a recent conversation didn’t disagree about this being the policy at some point), but I must have been interpreting something this way because I remember contesting it.
I hope you can see why I interpreted the post as making a pragmatic argument, not simply an epistemic argument, against saying others are acting in bad faith:
In the context of 2017, I also had a conversation with Anna Salamon where she said our main disagreement was about whether bad faith should be talked about (which implies our main disagreement wasn’t about how common bad faith was).
Edited to say you don’t recall this. I didn’t hear this from you, I head it secondhand perhaps from Michael Vassar, so I don’t at this point have strong reason to think you said this.
I agree that “speed at which you can convince someone” is relevant in a mistake theory. Edited to make this clear.
If I recall correctly you were at the time including some AGI capabilities research as part of alignment research (which makes a significant amount of theoretical sense, given that FAI has to pursue convergent instrumental goals). In this case developing an alignment solution before DeepMind develops AGI would be a form of competition. DeepMind people might be more interested in the alignment solution if it comes along with a capabilities boost (I’m not sure whether this consideration was discussed in the specific conversation I’m referring to, but it might have been considered in another conversation, which doesn’t mean it was in any way planned on).
Ok, this helps me disambiguate your honesty policy. If “employees may say things on the MIRI blog that would be very misleading under the assumption that this blog was not the output of MIRI playing politics and being PC and polite” is consistent with MIRI’s policies, it’s good for that to be generally known. In the case of the OpenAI blog post, the post is polite because it gives a misleadingly positive impression.
Just a note on my own mental state, reading the above:
Given the rather large number of misinterpretations and misrememberings and confusions-of-meaning in this and the previous post, along with Jessica quite badly mischaracterizing what I said twice in a row in a comment thread above, my status on any Jessica-summary (as opposed to directly quoted words) is “that’s probably not what the other person meant, nor what others listening to that person would have interpreted that person to mean.”
By “probably” I literally mean strictly probably, i.e. a greater than 50% chance of misinterpretation, in part because the set of things-Jessica-is-choosing-to-summarize is skewed toward those she found unusually surprising or objectionable.
If I were in Jessica’s shoes, I would by this point be replacing statements like “I had a conversation with Anna Salamon where she said X” with “I had a conversation with Anna Salamon where she said things which I interpreted to mean X” as a matter of general policy, so as not to be misleading-in-expectation to readers.