the other half would just be their general lack of visible activity over the last few years
This seems… quite false to me? Like, I think there’s probably been more visible output from MIRI in the past year than in the previous several? Hosting and publishing the MIRI dialogues counts as output. AGI Ruin: A List of Lethalities and the surrounding discourse is, like, one of the most important pieces to come out in the past several years. I think Six Dimensions of Operational Adequacy in AGI Projects is technically old, but important. See everything in 2021 MIRI Conversations and 2022 MIRI Discussion.
I do think MIRI “at least temporarily gave up” on personally executing on technical research agendas, or something like that, but, that’s not the only type of output.
Most of MIRI’s output is telling people why their plans are bad and won’t work, and sure that’s not very fun output, but, if you believe the set of things they believe, seems like straightforwardly good/important work.
Additionally, during the past couple of years, MIRI has:
Continuously maintained a “surviving branch” of its 2017 research agenda (currently has three full-time staff on it)
Launched the Visible Thoughts Project, which while not a stunning exemplar of efficient success has produced two complete 1000-step runs so far, with more in the works
Continued to fund Scott Garrabrant’s Agent Foundations research, including things like collab retreats and workshops with the Topos Institute (I helped run one two weeks ago and it went quite well by all accounts)
Continued to help with AI Impacts, although I think the credit there is more logistical/operational (approximately none of MIRI’s own research capacity is involved; AI Impacts is succeeding based on its own intellectual merits)
I don’t know much about this personally, but I’m pretty sure we’re funding part or all of what Vivek’s up to
Vanessa Kosoy’s research has split off a little bit to push in a different direction, but it was also directly funded by MIRI for several years, and came out of the 2017 agenda.
Oh, yeah, that’s totally fair. I agree that a lot of those writings are really valuable, and I’ve been especially pleased with how much Nate has been writing recently. I think there are a few factors that contributed to our disagreement here;
I meant to refer to my beliefs about MIRI at the time that Death With Dignity was published, which means most of what you linked wasn’t published yet. So by “last few years” I meant something like 2017-2021, which does look sparse.
I was actually thinking about something more like “direct” alignment work. 2013-2016 was a period where MIRI was outputting much more research, hosting workshops, et cetera.
MIRI is small enough that I often tend to think in terms of what the individual people are doing, rather than attributing it to the org, so I think of the 2021 MIRI conversations as “Eliezer yells at people” rather than “MIRI releases detailed communications about AI risk”
Anyway my overall reason for saying that was to argue that it’s reasonable for people to have been updating in the “MIRI giving up” direction long before Death With Dignity.
it’s reasonable for people to have been updating in the “MIRI giving up” direction long before Death With Dignity.
Hmm I’m actually not as sure about this one—I think there was definitely a sense that MIRI was withdrawing to focus on research and focus less on collaboration, there was the whole “non-disclosed by default” thing, and I think a health experiment on Eliezer’s part ate a year or so of intellectual output, but like, I was involved in a bunch of attempts to actively hire people / find more researchers to work with MIRI up until around the start of COVID. [I left MIRI before COVID started, and so was much less in touch with MIRI in 2020 and 2021.]
My model is that MIRI prioritized comms before 2013 or so, prioritized a mix of comms and research in 2013-2016, prioritized research in 2017-2020, and prioritized comms again starting in 2021.
(This is very crude and probably some MIRI people would characterize things totally differently.)
I don’t think we “gave up” in any of those periods of time, though we changed our mind about which kinds of activities were the best use of our time.
I was actually thinking about something more like “direct” alignment work. 2013-2016 was a period where MIRI was outputting much more research, hosting workshops, et cetera.
2013-2016 had more “research output” in the sense that we were writing more stuff up, not in the sense that we were necessarily doing more research then.
I feel like your comment is blurring together two different things:
If someone wasn’t paying much attention in 2017-2020 to our strategy/plan write-ups, they might have seen fewer public write-ups from us and concluded that we’ve given up.
(I don’t know that this actually happened? But I guess it might have happened some...?)
If someone was paying some attention to our strategy/plan write-ups in 2021-2023, but was maybe misunderstanding some parts, and didn’t care much about how much MIRI was publicly writing up (or did care, but only for technical results?), then they might conclude that we’ve given up.
Combining these two hypothetical misunderstandings into a single “MIRI 2017-2023 has given up” narrative seems very weird to me. We didn’t even stop doing pre-2017 things like Agent Foundations in 2017-2023, we just did other things too.
For what it’s worth, the last big MIRI output I remember is Arbital, which unfortunately didn’t get a lot of attention. But since then? Publishing lightly edited conversations doesn’t seem like a substantial research output to me.
(Scott Garrabrant has done a lot of impressive foundational formal research, but it seems to me of little applicability to alignment, since it doesn’t operate in the usual machine learning paradigm. It reminds a bit of research in formal logic in the last century: People expected it to be highly relevant for AI, yet it turned out to be completely irrelevant. Not even “Bayesian” approaches to AI did go anywhere. My hopes for other foundational formal research today are similarly low, except for formal work which roughly fits into the ML paradigm, like statistical learning theory.)
I actually think his recent work on geometric rationality will be very relevant for thinking about advanced shard theories. Shards are selected using winner-take-all dynamics. Also, in worlds where ML alone does not in fact get you all the way to AGI, his work will become far more relevant than the alignment work you feel bullish on.
formal logic is not at all irrelevant for AI. the problem with it is that it only works once you’ve got low enough uncertainty weights to use it on. Once you do, it’s an incredible boost to a model. And deep learning folks have known this for a while.
I do think MIRI “at least temporarily gave up” on personally executing on technical research agendas, or something like that, but, that’s not the only type of output.
So, I’m sure various people have probably thought about this a lot, but just to ask the obvious dumb question: Are we sure that this is even a good idea?
Let’s say the hope is that at some time in the future, we’ll stumble across an Amazing Insight that unblocks progress on AI alignment. At that point, it’s probably good to be able to execute quickly on turning that insight into actual mathematics (and then later actual corrigible AI designs, and then later actual code). It’s very easy for knowledge of “how to do things” to be lost, particularly technical knowledge. [1] Humanity loses this knowledge on a generational timescale, as people die, but it’s possible for institutions to lose knowledge much more quickly due to turnover. All that just to say: Maybe MIRI should keep doing some amount of technical research, just to “stay in practice”.
My general impression here is that there’s plenty of unfinished work in agent foundations and decision theory, things like: How do we actually write a bounded program that implements something like UDT? How do we actually do calculations with logical decision theories such that we can get answers out for basic game-theory scenarios (even something as simple as the ultimatum game is unsolved IIRC)? What are some common-sense constraints the program-value-functions should obey (eg. how should we value a program that simulates multiple other programs?)? These all seem like they are likely to be relevant to alignment, and also intrinsically worth doing.
Agent Foundations research has stuttered a bit over the team going remote and its membership shifting and various other logistical hurdles, but has been continuous throughout.
There’s also at least one other team (the one I provide ops support to) that has been continuous since 2017.
I think the thing Raemon is pointing at is something like “in 2020, both Nate and Eliezer would’ve answered ‘yes’ if asked whether they were regularly spending work hours every day on a direct, technical research agenda; in 2023 they would both answer ‘no.’”
What Duncan said. “MIRI at least temporarily gave up on personally executing on technical research agendas” is false, though a related claim is true: “Nate and Eliezer (who are collectively a major part of MIRI’s research leadership and play a huge role in the org’s strategy-setting) don’t currently see a technical research agenda that’s promising enough for them to want to personally focus on it, or for them to want the organization to make it an overriding priority”.
I do think the “temporarily” and “currently” parts of those statements is quite important: part of why the “MIRI has given up” narrative is silly is that it’s rewriting history to gloss “we don’t know what to do” as “we know what to do, but we don’t want to do it”. We don’t know what to do, but if someone came up with a good idea that we could help with, we’d jump on it!
There are many negative-sounding descriptions of MIRI’s state that I could see an argument for, as stylized narratives (“MIRI doesn’t know what to do”, “MIRI is adrift”, etc.). Somehow, though, people skipped over all those perfectly serviceable pejorative options and went straight for the option that’s definitely just not true?
We don’t know what to do, but if someone came up with a good idea that we could help with, we’d jump on it!
In that case, I think the socially expected behavior is to do some random busywork, to send a clear signal that you not lazy.
(In corporate environment, the usual solution is to organize a few meetings. Not sure what is the equivalent for non-profits… perhaps organizing conferences?)
I don’t see how that would help at all, and pure busywork is silly when you have lots of things to do that are positive-EV but probably low-impact.
MIRI “doesn’t know what to do” in the sense that we don’t see a strategy with macroscopic probability of saving the world, and the most-promising ones with microscopic probability are very diverse and tend to violate or side-step our current models in various ways, such that it’s hard to pick actions that help much with those scenarios as a class.
That’s different from MIRI “not knowing what to do” in the sense of having no ideas for local actions that are worth trying on EV grounds. (Though a lot of these look like encouraging non-MIRI people to try lots of things and build skill and models in ways that might change the strategic situation down the road.)
(Also, I’m mainly trying to describe Nate and Eliezer’s views here. Other MIRI researchers are more optimistic about some of the technical work we’re doing, AFAIK.)
This seems… quite false to me? Like, I think there’s probably been more visible output from MIRI in the past year than in the previous several? Hosting and publishing the MIRI dialogues counts as output. AGI Ruin: A List of Lethalities and the surrounding discourse is, like, one of the most important pieces to come out in the past several years. I think Six Dimensions of Operational Adequacy in AGI Projects is technically old, but important. See everything in 2021 MIRI Conversations and 2022 MIRI Discussion.
I do think MIRI “at least temporarily gave up” on personally executing on technical research agendas, or something like that, but, that’s not the only type of output.
Most of MIRI’s output is telling people why their plans are bad and won’t work, and sure that’s not very fun output, but, if you believe the set of things they believe, seems like straightforwardly good/important work.
Additionally, during the past couple of years, MIRI has:
Continuously maintained a “surviving branch” of its 2017 research agenda (currently has three full-time staff on it)
Launched the Visible Thoughts Project, which while not a stunning exemplar of efficient success has produced two complete 1000-step runs so far, with more in the works
Continued to fund Scott Garrabrant’s Agent Foundations research, including things like collab retreats and workshops with the Topos Institute (I helped run one two weeks ago and it went quite well by all accounts)
Continued to help with AI Impacts, although I think the credit there is more logistical/operational (approximately none of MIRI’s own research capacity is involved; AI Impacts is succeeding based on its own intellectual merits)
I don’t know much about this personally, but I’m pretty sure we’re funding part or all of what Vivek’s up to
This is not a comprehensive list.
Vanessa Kosoy’s research has split off a little bit to push in a different direction, but it was also directly funded by MIRI for several years, and came out of the 2017 agenda.
Oh, yeah, that’s totally fair. I agree that a lot of those writings are really valuable, and I’ve been especially pleased with how much Nate has been writing recently. I think there are a few factors that contributed to our disagreement here;
I meant to refer to my beliefs about MIRI at the time that Death With Dignity was published, which means most of what you linked wasn’t published yet. So by “last few years” I meant something like 2017-2021, which does look sparse.
I was actually thinking about something more like “direct” alignment work. 2013-2016 was a period where MIRI was outputting much more research, hosting workshops, et cetera.
MIRI is small enough that I often tend to think in terms of what the individual people are doing, rather than attributing it to the org, so I think of the 2021 MIRI conversations as “Eliezer yells at people” rather than “MIRI releases detailed communications about AI risk”
Anyway my overall reason for saying that was to argue that it’s reasonable for people to have been updating in the “MIRI giving up” direction long before Death With Dignity.
Hmm I’m actually not as sure about this one—I think there was definitely a sense that MIRI was withdrawing to focus on research and focus less on collaboration, there was the whole “non-disclosed by default” thing, and I think a health experiment on Eliezer’s part ate a year or so of intellectual output, but like, I was involved in a bunch of attempts to actively hire people / find more researchers to work with MIRI up until around the start of COVID. [I left MIRI before COVID started, and so was much less in touch with MIRI in 2020 and 2021.]
My model is that MIRI prioritized comms before 2013 or so, prioritized a mix of comms and research in 2013-2016, prioritized research in 2017-2020, and prioritized comms again starting in 2021.
(This is very crude and probably some MIRI people would characterize things totally differently.)
I don’t think we “gave up” in any of those periods of time, though we changed our mind about which kinds of activities were the best use of our time.
2013-2016 had more “research output” in the sense that we were writing more stuff up, not in the sense that we were necessarily doing more research then.
I feel like your comment is blurring together two different things:
If someone wasn’t paying much attention in 2017-2020 to our strategy/plan write-ups, they might have seen fewer public write-ups from us and concluded that we’ve given up.
(I don’t know that this actually happened? But I guess it might have happened some...?)
If someone was paying some attention to our strategy/plan write-ups in 2021-2023, but was maybe misunderstanding some parts, and didn’t care much about how much MIRI was publicly writing up (or did care, but only for technical results?), then they might conclude that we’ve given up.
Combining these two hypothetical misunderstandings into a single “MIRI 2017-2023 has given up” narrative seems very weird to me. We didn’t even stop doing pre-2017 things like Agent Foundations in 2017-2023, we just did other things too.
For what it’s worth, the last big MIRI output I remember is Arbital, which unfortunately didn’t get a lot of attention. But since then? Publishing lightly edited conversations doesn’t seem like a substantial research output to me.
(Scott Garrabrant has done a lot of impressive foundational formal research, but it seems to me of little applicability to alignment, since it doesn’t operate in the usual machine learning paradigm. It reminds a bit of research in formal logic in the last century: People expected it to be highly relevant for AI, yet it turned out to be completely irrelevant. Not even “Bayesian” approaches to AI did go anywhere. My hopes for other foundational formal research today are similarly low, except for formal work which roughly fits into the ML paradigm, like statistical learning theory.)
I actually think his recent work on geometric rationality will be very relevant for thinking about advanced shard theories. Shards are selected using winner-take-all dynamics. Also, in worlds where ML alone does not in fact get you all the way to AGI, his work will become far more relevant than the alignment work you feel bullish on.
formal logic is not at all irrelevant for AI. the problem with it is that it only works once you’ve got low enough uncertainty weights to use it on. Once you do, it’s an incredible boost to a model. And deep learning folks have known this for a while.
Are there any modern models which use hardcoded rules written in formal logic?
sent dm.
It appears I didn’t get it?
Edit: Got it.
Question about this part:
So, I’m sure various people have probably thought about this a lot, but just to ask the obvious dumb question: Are we sure that this is even a good idea?
Let’s say the hope is that at some time in the future, we’ll stumble across an Amazing Insight that unblocks progress on AI alignment. At that point, it’s probably good to be able to execute quickly on turning that insight into actual mathematics (and then later actual corrigible AI designs, and then later actual code). It’s very easy for knowledge of “how to do things” to be lost, particularly technical knowledge. [1] Humanity loses this knowledge on a generational timescale, as people die, but it’s possible for institutions to lose knowledge much more quickly due to turnover. All that just to say: Maybe MIRI should keep doing some amount of technical research, just to “stay in practice”.
My general impression here is that there’s plenty of unfinished work in agent foundations and decision theory, things like: How do we actually write a bounded program that implements something like UDT? How do we actually do calculations with logical decision theories such that we can get answers out for basic game-theory scenarios (even something as simple as the ultimatum game is unsolved IIRC)? What are some common-sense constraints the program-value-functions should obey (eg. how should we value a program that simulates multiple other programs?)? These all seem like they are likely to be relevant to alignment, and also intrinsically worth doing.
[1] This talk is relevant: https://www.youtube.com/watch?v=ZSRHeXYDLko
Agent Foundations research has stuttered a bit over the team going remote and its membership shifting and various other logistical hurdles, but has been continuous throughout.
There’s also at least one other team (the one I provide ops support to) that has been continuous since 2017.
I think the thing Raemon is pointing at is something like “in 2020, both Nate and Eliezer would’ve answered ‘yes’ if asked whether they were regularly spending work hours every day on a direct, technical research agenda; in 2023 they would both answer ‘no.’”
What Duncan said. “MIRI at least temporarily gave up on personally executing on technical research agendas” is false, though a related claim is true: “Nate and Eliezer (who are collectively a major part of MIRI’s research leadership and play a huge role in the org’s strategy-setting) don’t currently see a technical research agenda that’s promising enough for them to want to personally focus on it, or for them to want the organization to make it an overriding priority”.
I do think the “temporarily” and “currently” parts of those statements is quite important: part of why the “MIRI has given up” narrative is silly is that it’s rewriting history to gloss “we don’t know what to do” as “we know what to do, but we don’t want to do it”. We don’t know what to do, but if someone came up with a good idea that we could help with, we’d jump on it!
There are many negative-sounding descriptions of MIRI’s state that I could see an argument for, as stylized narratives (“MIRI doesn’t know what to do”, “MIRI is adrift”, etc.). Somehow, though, people skipped over all those perfectly serviceable pejorative options and went straight for the option that’s definitely just not true?
In that case, I think the socially expected behavior is to do some random busywork, to send a clear signal that you not lazy.
(In corporate environment, the usual solution is to organize a few meetings. Not sure what is the equivalent for non-profits… perhaps organizing conferences?)
I don’t see how that would help at all, and pure busywork is silly when you have lots of things to do that are positive-EV but probably low-impact.
MIRI “doesn’t know what to do” in the sense that we don’t see a strategy with macroscopic probability of saving the world, and the most-promising ones with microscopic probability are very diverse and tend to violate or side-step our current models in various ways, such that it’s hard to pick actions that help much with those scenarios as a class.
That’s different from MIRI “not knowing what to do” in the sense of having no ideas for local actions that are worth trying on EV grounds. (Though a lot of these look like encouraging non-MIRI people to try lots of things and build skill and models in ways that might change the strategic situation down the road.)
(Also, I’m mainly trying to describe Nate and Eliezer’s views here. Other MIRI researchers are more optimistic about some of the technical work we’re doing, AFAIK.)
Ah got it, thanks for the reply!