To what extent would the organization be factoring in transformative AI timelines? It seems to me like the kinds of questions one would prioritize in a “normal period” look very different than the kinds of questions that one would prioritize if they place non-trivial probability on “AI may kill everyone in <10 years” or “AI may become better than humans on nearly all cognitive tasks in <10 years.”
I ask partly because I personally would be more excited of a version of this that wasn’t ignoring AGI timelines, but I think a version of this that’s not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.
More generally, perhaps it would be good for you to describe some ways in which you expect this to be different than FHI. I think the calling it the FHI of the West, the explicit statement that it would have the intellectual tradition of FHI, and the announcement right when FHI dissolves might make it seem like “I want to copy FHI” as opposed to “OK obviously I don’t want to copy it entirely I just want to draw on some of its excellent intellectual/cultural components.” If your vision is the latter, I’d find it helpful to see a list of things that you expect to be similar/different.)
To what extent would the organization be factoring in transformative AI timelines? It seems to me like the kinds of questions one would prioritize in a “normal period” look very different than the kinds of questions that one would prioritize if they place non-trivial probability on “AI may kill everyone in <10 years” or “AI may become better than humans on nearly all cognitive tasks in <10 years.”
My guess is a lot, because the future of humanity sure depends on the details of how AI goes. But I do think I would want the primary optimization criterion of such an organization to be truth-seeking and to have quite strong norms and guardrails against anything that would trade off communicating truths against making a short-term impact and gaining power.
As an example of one thing I would do very differently from FHI (and a thing that I talked with Bostrom about somewhat recently where we seemed to agree) was that with the world moving faster and more things happening, you really want to focus on faster OODA loops in your truth-seeking institutions.
This suggests that instead of publishing books, or going through month-long academic review processes, you want to move more towards things like blogposts and comments, and maybe in the limit even on things like live panels where you analyze things right as they happen.
I do think there are lots of failure modes around becoming too news-focused (and e.g. on LW we do a lot of things to not become too news-focused), so I think this is a dangerous balance, but its one of the things I think I would do pretty differently, and which depends on transformative AI timelines.
To comment a bit more on the power stuff: I think a thing that I am quite worried about is that as more stuff happens more quickly with AI people will feel a strong temptation to trade in some of the epistemic trust they have built with others, into resources that they can deploy directly under their control, because as more things happen, its harder to feel in control and by just getting more resources directly under your control (as opposed to trying to improve the decisions of others by discovering and communicating important truths) you can regain some of that feeling of control. That is one dynamic I would really like to avoid with any organization like this, where I would like it to continue to have a stance towards the world that is about improving sanity, and not about getting resources for itself and its allies.
I ask partly because I personally would be more excited of a version of this that wasn’t ignoring AGI timelines, but I think a version of this that’s not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.
This frame feels a bit off to me. Partly because I don’t think FHI was ignoring timelines, and because I think their work has proved quite useful already—mostly by substantially improving our concepts for reasoning about existential risk.
But also, the portfolio of alignment research with maximal expected value need not necessarily perform well in the most likely particular world. One might imagine, for example—and indeed this is my own bet—that the most valuable actions we can take will only actually save us in the subset of worlds in which we have enough time to develop a proper science of alignment.
To what extent would the organization be factoring in transformative AI timelines? It seems to me like the kinds of questions one would prioritize in a “normal period” look very different than the kinds of questions that one would prioritize if they place non-trivial probability on “AI may kill everyone in <10 years” or “AI may become better than humans on nearly all cognitive tasks in <10 years.”
I ask partly because I personally would be more excited of a version of this that wasn’t ignoring AGI timelines, but I think a version of this that’s not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.
More generally, perhaps it would be good for you to describe some ways in which you expect this to be different than FHI. I think the calling it the FHI of the West, the explicit statement that it would have the intellectual tradition of FHI, and the announcement right when FHI dissolves might make it seem like “I want to copy FHI” as opposed to “OK obviously I don’t want to copy it entirely I just want to draw on some of its excellent intellectual/cultural components.” If your vision is the latter, I’d find it helpful to see a list of things that you expect to be similar/different.)
My guess is a lot, because the future of humanity sure depends on the details of how AI goes. But I do think I would want the primary optimization criterion of such an organization to be truth-seeking and to have quite strong norms and guardrails against anything that would trade off communicating truths against making a short-term impact and gaining power.
As an example of one thing I would do very differently from FHI (and a thing that I talked with Bostrom about somewhat recently where we seemed to agree) was that with the world moving faster and more things happening, you really want to focus on faster OODA loops in your truth-seeking institutions.
This suggests that instead of publishing books, or going through month-long academic review processes, you want to move more towards things like blogposts and comments, and maybe in the limit even on things like live panels where you analyze things right as they happen.
I do think there are lots of failure modes around becoming too news-focused (and e.g. on LW we do a lot of things to not become too news-focused), so I think this is a dangerous balance, but its one of the things I think I would do pretty differently, and which depends on transformative AI timelines.
To comment a bit more on the power stuff: I think a thing that I am quite worried about is that as more stuff happens more quickly with AI people will feel a strong temptation to trade in some of the epistemic trust they have built with others, into resources that they can deploy directly under their control, because as more things happen, its harder to feel in control and by just getting more resources directly under your control (as opposed to trying to improve the decisions of others by discovering and communicating important truths) you can regain some of that feeling of control. That is one dynamic I would really like to avoid with any organization like this, where I would like it to continue to have a stance towards the world that is about improving sanity, and not about getting resources for itself and its allies.
This frame feels a bit off to me. Partly because I don’t think FHI was ignoring timelines, and because I think their work has proved quite useful already—mostly by substantially improving our concepts for reasoning about existential risk.
But also, the portfolio of alignment research with maximal expected value need not necessarily perform well in the most likely particular world. One might imagine, for example—and indeed this is my own bet—that the most valuable actions we can take will only actually save us in the subset of worlds in which we have enough time to develop a proper science of alignment.