Thanks, that was really helpful. I continue to have a sense of disagreement that this is the right way to do things, so I’ll try to point to some of that. Unfortunately my comment here is not super focused, though I am just trying to say a single thing.
I recently wrote down a bunch of my thoughts about evaluating MIRI, and I realised that I think MIRI has gone through alternating phases of internal concentration and external explanation, in a way that feels quite healthy to me.
In the last 2-5 years I endorsed donating to MIRI (and still do), and my reasoning back then was always of the type “I don’t understand their technical research, but I have read a substantial amount of the philosophy and worldview that was used to successfully pluck that problem out of the space of things to work on, and think it is deeply coherent and sensible and it’s been surprisingly successful in figuring out AI is an x-risk, and I expect to find it is doing very sensible things in places I understand less well.” Then, about a year ago, MIRI published the Embdded Agency sequence, and for the first time I thought “Oh, now I feel like I have an understanding of what the technical research is guided by, and what it’s about, and indeed, this makes a lot of sense.” My feelings have rarely been changed by reading ongoing research papers at MIRI, which were mostly just very confusing to me. They all seemed individually interesting, but I didn’t see the broader picture before Embedded Agency.
I continued:
So, my current epistemic state is something like this: Eliezer and Benya and Patrick and others spent something like 4 MIRI-years hacking away at research, and I didn’t get it. Finally Scott and Abram made some further progress on it, and crystalised it into an explanation I actually felt I sorta get. And most of the time I spent trying to understand their work in the meantime was wasted effort on my part, and quite plausibly wasted effort on their part. I remember that time they wrote up a ton of formal-looking papers for the puerto rico conference, to be ready in case a field suddenly sprang around them… but then nobody really read them or built on them. So I don’t mind if, in the intervening 3-4 years, they again don’t really try to explain what they’re thinking about to me, until a bit of progress and good explanation comes along. They’ll continue to write things about the background worldview, like Security Mindset, and Rocket Alignment, and Challenges to Christiano’s Capability Amplification Proposal, and all of the beautiful posts that Scott and Abram write, but overall focus on getting a better understanding of the problem by themselves.
I think the output of this pattern is perhaps the primary ways I evaluate whether I think MIRI is making progress.
I’ll just think aloud a little bit more on this topic:
Writing by CFAR staff, for example a lot of comments on this post by Adam, Anna, Luke, Brienne, and others, are one of the primary ways I update my model of how these people are thinking, and how much it feels interesting and promising to me. I’m not talking about “major conclusions” or “rigorously evidenced results”, but just stuff like the introspective data Luke uses when evaluating whether he’s making progress (I really liked that comment), or Dan saying what posts are top of his mind that he would write to LW. Adam’s comments about looking for blind spots and Anna’s comment about tacit/explicit are even more helpful, but the difference between their comments and Luke/Dan’s comments isn’t as large as between Luke/Dan’s comments and nothing.
It’s not that I don’t update on in-person tacit skills and communication, of course that’s a major factor I use when thinking about who to trust and in what way. But especially as I’m think of groups of people, over many years, doing research, I’m increasingly interested in whether they’re able to make records of their thoughts that communicate well with each other—how much writing they do. This kind of more legible tracking of ideas and thought is pretty key to me. This is in part because I think I personally would have a very difficult time doing research without good, long-term, external working memory, and also from some of my models of the difficulty of coordinating groups.
Adam’s answer above is both intrinsically interesting and very helpful for discussing this topic. But when I asked myself if it felt sufficient to me, I said no. It matters to me a lot whether I expect CFAR to try hard to do the sort of translational work into explicit knowledge at some point down the line, like MIRI has successfully done multiple times and explicitly intends to do in future. CFAR and CFAR alumni explore a lot of things that typically signal having lost contact with scientific materialism and standards for communal, public evidence like “enlightenment”, “chakras”, “enneagram”, and “circling”. I think walking out into the hinterlands and then returning with the gold that was out there is great, and I’ve found one of those five things quite personally valuable—on a list that was selected for being the least promising on surface appearances—and I have a pretty strong “Rule Thinkers In, Not Out” vibe around that. But if CFAR never comes back and creates the explicit models, then it’s increasingly looking like 5, 10, 20 years of doing stuff that looks (from the outside) similar to most others who have tried to understand their own minds, and who have largely lost a hold of reality. The important thing is that I can’t tell from the outside whether this stuff turned out to be true. This doesn’t mean that you can’t either (from the inside), but I do think there’s often a surprising pairing where accountability and checkability end up actually being one of the primary ways you find out for yourself whether what you’ve learned is actually true and real.
Paul Graham has a line, where he says startups should “take on as much technical debt as they can, and no more”. He’s saying that “avoiding technical debt” is not a virtue you should aspire to, and that you should let that slide in service of quickly making a product that people love, while of course admitting that there is some boundary line that if you cross, is just going to stop your system from working. If I apply that idea to research and use Chris Olah’s term “research debt”, the line would be “you should take on as much research debt as possible, and no more”. I don’t think all of CFAR’s ideas should be explicit, or have rigorous data tracking, or have randomised controlled trials, or be cached out in the psychological literature (which is a mess), and it’s fine to spend many years going down paths that you can’t feasibly justify to others. But, if you’re doing research, trying to take difficult steps in reasoning, I think you need to come back at some point and make a thing that others can build on. I don’t know how many years you can keep going without coming back, but I’d guess like 5 is probably as long as you want to start with.
I guess I should‘ve said this earlier, but there’s also few things as exciting for me as when CFAR staff write their ideas about rationality into posts. I love reading Brienne’s stuff and Anna’s stuff, and I liked a number of Duncan’s things (Double Crux, Buckets and Bayes, etc). (It’s plausible this is more of my motivation than I let on, although I stand by everything I said above as true.) I think many other LessWrongers are also very excited about such writing. I assign some probability that this is itself is a big enough reason that should cause CFAR to want to write more (growth and excitement have many healthy benefits for communities).
As part of CFAR’s work to research and develop an art of rationality (as I understand CFAR staff thinks of part of the work as being research, e.g. Brienne’s comment below), if it was an explicit goal to translate many key insights into explicit knowledge, then I would feel far more safe and confident around many of the parts that seem on first and second checking, like they are clearly wrong. If it isn’t, then I feel much more ‘all at sea’.
I’m aware that there are more ways of providing your thinking with good feedback loops than “explaining your ideas to the people who read LW”. You can find other audiences. You can have smaller groups of people you trust and to whom you explain the ideas. You can have testable outcomes. You can just be a good enough philosopher to set your own course through reality and not look to others for whether it makes sense to them. But from my perspective, without blogposts like those Anna and Brienne have written in the past, I personally am having a hard time telling whether a number of CFAR’s focuses are on the right track.
I think I’ll say something similar to what Eliezer said at the bottom of his public critique of Paul’s work, and mention that from my epistemic vantage point, even given my disagreements, I think CFAR has had and continues to have surprisingly massive positive effects on the direction and agency of me and others trying to reduce x-risk, and I think that they should definitely be funded to do all the stuff they do.
Ben to check, before I respond—would a fair summary of your position be, “CFAR should write more in public, e.g. on LessWrong, so that A) it can have better feedback loops, and B) more people can benefit from its ideas?”
Thanks, that was really helpful. I continue to have a sense of disagreement that this is the right way to do things, so I’ll try to point to some of that. Unfortunately my comment here is not super focused, though I am just trying to say a single thing.
I recently wrote down a bunch of my thoughts about evaluating MIRI, and I realised that I think MIRI has gone through alternating phases of internal concentration and external explanation, in a way that feels quite healthy to me.
Here is what I said:
I continued:
I think the output of this pattern is perhaps the primary ways I evaluate whether I think MIRI is making progress.
I’ll just think aloud a little bit more on this topic:
Writing by CFAR staff, for example a lot of comments on this post by Adam, Anna, Luke, Brienne, and others, are one of the primary ways I update my model of how these people are thinking, and how much it feels interesting and promising to me. I’m not talking about “major conclusions” or “rigorously evidenced results”, but just stuff like the introspective data Luke uses when evaluating whether he’s making progress (I really liked that comment), or Dan saying what posts are top of his mind that he would write to LW. Adam’s comments about looking for blind spots and Anna’s comment about tacit/explicit are even more helpful, but the difference between their comments and Luke/Dan’s comments isn’t as large as between Luke/Dan’s comments and nothing.
It’s not that I don’t update on in-person tacit skills and communication, of course that’s a major factor I use when thinking about who to trust and in what way. But especially as I’m think of groups of people, over many years, doing research, I’m increasingly interested in whether they’re able to make records of their thoughts that communicate well with each other—how much writing they do. This kind of more legible tracking of ideas and thought is pretty key to me. This is in part because I think I personally would have a very difficult time doing research without good, long-term, external working memory, and also from some of my models of the difficulty of coordinating groups.
Adam’s answer above is both intrinsically interesting and very helpful for discussing this topic. But when I asked myself if it felt sufficient to me, I said no. It matters to me a lot whether I expect CFAR to try hard to do the sort of translational work into explicit knowledge at some point down the line, like MIRI has successfully done multiple times and explicitly intends to do in future. CFAR and CFAR alumni explore a lot of things that typically signal having lost contact with scientific materialism and standards for communal, public evidence like “enlightenment”, “chakras”, “enneagram”, and “circling”. I think walking out into the hinterlands and then returning with the gold that was out there is great, and I’ve found one of those five things quite personally valuable—on a list that was selected for being the least promising on surface appearances—and I have a pretty strong “Rule Thinkers In, Not Out” vibe around that. But if CFAR never comes back and creates the explicit models, then it’s increasingly looking like 5, 10, 20 years of doing stuff that looks (from the outside) similar to most others who have tried to understand their own minds, and who have largely lost a hold of reality. The important thing is that I can’t tell from the outside whether this stuff turned out to be true. This doesn’t mean that you can’t either (from the inside), but I do think there’s often a surprising pairing where accountability and checkability end up actually being one of the primary ways you find out for yourself whether what you’ve learned is actually true and real.
Paul Graham has a line, where he says startups should “take on as much technical debt as they can, and no more”. He’s saying that “avoiding technical debt” is not a virtue you should aspire to, and that you should let that slide in service of quickly making a product that people love, while of course admitting that there is some boundary line that if you cross, is just going to stop your system from working. If I apply that idea to research and use Chris Olah’s term “research debt”, the line would be “you should take on as much research debt as possible, and no more”. I don’t think all of CFAR’s ideas should be explicit, or have rigorous data tracking, or have randomised controlled trials, or be cached out in the psychological literature (which is a mess), and it’s fine to spend many years going down paths that you can’t feasibly justify to others. But, if you’re doing research, trying to take difficult steps in reasoning, I think you need to come back at some point and make a thing that others can build on. I don’t know how many years you can keep going without coming back, but I’d guess like 5 is probably as long as you want to start with.
I guess I should‘ve said this earlier, but there’s also few things as exciting for me as when CFAR staff write their ideas about rationality into posts. I love reading Brienne’s stuff and Anna’s stuff, and I liked a number of Duncan’s things (Double Crux, Buckets and Bayes, etc). (It’s plausible this is more of my motivation than I let on, although I stand by everything I said above as true.) I think many other LessWrongers are also very excited about such writing. I assign some probability that this is itself is a big enough reason that should cause CFAR to want to write more (growth and excitement have many healthy benefits for communities).
As part of CFAR’s work to research and develop an art of rationality (as I understand CFAR staff thinks of part of the work as being research, e.g. Brienne’s comment below), if it was an explicit goal to translate many key insights into explicit knowledge, then I would feel far more safe and confident around many of the parts that seem on first and second checking, like they are clearly wrong. If it isn’t, then I feel much more ‘all at sea’.
I’m aware that there are more ways of providing your thinking with good feedback loops than “explaining your ideas to the people who read LW”. You can find other audiences. You can have smaller groups of people you trust and to whom you explain the ideas. You can have testable outcomes. You can just be a good enough philosopher to set your own course through reality and not look to others for whether it makes sense to them. But from my perspective, without blogposts like those Anna and Brienne have written in the past, I personally am having a hard time telling whether a number of CFAR’s focuses are on the right track.
I think I’ll say something similar to what Eliezer said at the bottom of his public critique of Paul’s work, and mention that from my epistemic vantage point, even given my disagreements, I think CFAR has had and continues to have surprisingly massive positive effects on the direction and agency of me and others trying to reduce x-risk, and I think that they should definitely be funded to do all the stuff they do.
Ben to check, before I respond—would a fair summary of your position be, “CFAR should write more in public, e.g. on LessWrong, so that A) it can have better feedback loops, and B) more people can benefit from its ideas?”