One possibility is that MIRI’s arguments actually do look that terrible to you, but that this is because MIRI hasn’t managed to make them clearly enough.
MIRI claim to have had an important insight on AI design (this so called “Löbian obstacle”) that experts in relevant fields (AI, model checking, automated theorem proving, etc.) didn’t have. MIRI attempted to communicate their insight, but so far the experts have mostly ignored MIRI claims or denied that they are likely to be important and relevant.
What is the most likely explanation for that? It seems that we can narrow it to two hypotheses: A) MIRI’s insight is really relevant and important to AI design, but communication with the experts failed because of some problem on MIRI’s side, or on the experts’ side (e.g. stubbornness, stupidity) or both (e.g. excessively different backgrounds). B) MIRI is mistaken about the value of their insight (possible psychological causes may include confirmation bias, Dunning–Kruger effect, groupthink, overconfident personalities, etc.).
I would say that, barring evidence to the contrary, hypothesis B is the most likely explanation.
What is the most likely explanation for that? It seems that we can narrow it to two hypotheses:
A) MIRI’s insight is really relevant and important to AI design, but communication with the experts failed because > of some problem on MIRI’s side, or on the experts’ side (e.g. stubbornness, stupidity) or both (e.g. excessively different backgrounds).
B) MIRI is mistaken about the value of their insight (possible psychological causes may include confirmation bias, Dunning–Kruger effect, groupthink, overconfident personalities, etc.).
I don’t believe these options are exhaustive. “Relevant and valuable” are subjective and time-varying. The published work might be interesting and useful down the line, but not help the problems that AI researchers are working on right now.
It usually takes a few years for the research community to assimilate strange new ideas—sometimes much more than a few years. This isn’t due to a failure on anybody’s part—it’s due to the fact that scientists pick problems where they have a reasonable prospect of success within a few years.
I would give MIRI at least a decade or two before evaluating whether their work had any mainstream traction.
MIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving.
Do you claim that MIRI is one or two decades ahead of mainstream researchers?
If the answer is no, then how does MIRI (or MIRI donors) evaluate now whether these lines of work are valuable towards advancing their stated goals?
MIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving.
Research has both ultimate goals (“machines that think”) and short-term goals (“machines that can parse spoken English”). My impression is that the MIRI agenda is relevant to the ultimate goal of AI research, but has only limited overlap with the things people are really working on in the short term. I haven’t seen MIRI work that looked directly relevant to existing work on theorem proving or model checking. (I don’t know much about automated theorem proving, but do know a bit about model checking.)
Do you claim that MIRI is one or two decades ahead of mainstream researchers?
It’s not a matter of “ahead”. Any research area is typically a bunch of separate tracks that proceed separately and eventually merge together or have interconnections. It might be several decades before the MIRI/self modifying AI track merges with the main line of AI or CS research. That’s not necessarily a sign anything is wrong. It took decades of improvement before formal verification or theorem proving become part of the computer science toolkit. I would consider MIRI a success if it follows a similar trajectory.
If the answer is no, then how does MIRI (or MIRI donors) evaluate now whether these lines of work are valuable towards advancing their stated goals?
I can’t imagine any really credible assurance that “this basic research is definitely useful,” for any basic research. The ultimate goal “safe self modifying AI” is too remote to have any idea if we’re on the right track. But if MIRI, motivated by that goal, does neat stuff, I think it’s a safe bet that (A) the people doing the work are clueful, and (B) their work was at least potentially useful in dealing with AI risks. And potentially useful is the best assurance anybody can ever give.
I’m a computer systems guy, not a theorist or AI researcher, but my opinion of MIRI has gradually shifted from “slightly crankish” to “there are some interesting questions here and MIRI might be doing useful work on them that nobody else is currently doing.” My impression is a number of mainstream computer scientists have similar views.
Eliezer recently gave a talk at MIT. If the audience threw food at the stage, I would consider that evidence for MIRI being crankish. If knowledgeable CS types showed up and were receptive or interested, I would consider that a strong vote of confidence. Anybody able to comment?
MIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving.
It’s definitely not a goal of mainstream AI, and not even a goal of most AGI researchers, to create self-modifying AI that provably preserves its goals. MIRI’s work on this topic doesn’t seem relevant to what mainstream AI researchers want to achieve.
Zooming out from MIRI’s technical work to MIRI’s general mission, it’s certainly true that MIRI’s failure to convince the AI world of the importance of preventing unFriendly AI is Bayesian evidence against MIRI’s perspective being correct. Personally, I don’t find this evidence strong enough to make me think that preventing unFriendly AI isn’t worth working on.
Also, two more points why MIRI isn’t that likely to produce research AI researchers will see as a direct boon to their field: One, stuff that’s close to something people are already trying to do is more often already worked on; the stuff that people aren’t working on seem more important for MIRI to work on. And two, AGI researchers in particular are particularly interested in results that get us closer to AGI, and MIRI is trying to work on topics that can be published about without bringing the world closer to AGI.
MIRI claim to have had an important insight on AI design (this so called “Löbian obstacle”) that experts in relevant fields (AI, model checking, automated theorem proving, etc.) didn’t have. MIRI attempted to communicate their insight, but so far the experts have mostly ignored MIRI claims or denied that they are likely to be important and relevant.
I wouldn’t say MIRI has tried very hard yet to communicate about the Lobian obstacle to people in the relevant fields. E.g. we haven’t published about the Lobian obstacle in a journal or conference proceedings.
Part of the reason for that is that we don’t expect experts in these fields to be very interested in it. The Lobian obstacle is aiming at better understanding of strongly self-modifying systems, which won’t exist for at least 15 years, and probably longer than that.
Part of the reason for that is that we don’t expect experts in these fields to be very interested in it. The Lobian obstacle is aiming at better understanding of strongly self-modifying systems, which won’t exist for at least 15 years, and probably longer than that.
I agree the AI community won’t be very interested. But it might be worth sending it to some theoretical computer science venue—STOC, say—instead. If nothing else, it would give useful information about how receptive academics are to the topic.
MIRI claim to have had an important insight on AI design (this so called “Löbian obstacle”) that experts in relevant fields (AI, model checking, automated theorem proving, etc.) didn’t have. MIRI attempted to communicate their insight, but so far the experts have mostly ignored MIRI claims or denied that they are likely to be important and relevant.
What is the most likely explanation for that? It seems that we can narrow it to two hypotheses:
A) MIRI’s insight is really relevant and important to AI design, but communication with the experts failed because of some problem on MIRI’s side, or on the experts’ side (e.g. stubbornness, stupidity) or both (e.g. excessively different backgrounds).
B) MIRI is mistaken about the value of their insight (possible psychological causes may include confirmation bias, Dunning–Kruger effect, groupthink, overconfident personalities, etc.).
I would say that, barring evidence to the contrary, hypothesis B is the most likely explanation.
I don’t believe these options are exhaustive. “Relevant and valuable” are subjective and time-varying. The published work might be interesting and useful down the line, but not help the problems that AI researchers are working on right now.
It usually takes a few years for the research community to assimilate strange new ideas—sometimes much more than a few years. This isn’t due to a failure on anybody’s part—it’s due to the fact that scientists pick problems where they have a reasonable prospect of success within a few years.
I would give MIRI at least a decade or two before evaluating whether their work had any mainstream traction.
MIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving.
Do you claim that MIRI is one or two decades ahead of mainstream researchers?
If the answer is no, then how does MIRI (or MIRI donors) evaluate now whether these lines of work are valuable towards advancing their stated goals?
Research has both ultimate goals (“machines that think”) and short-term goals (“machines that can parse spoken English”). My impression is that the MIRI agenda is relevant to the ultimate goal of AI research, but has only limited overlap with the things people are really working on in the short term. I haven’t seen MIRI work that looked directly relevant to existing work on theorem proving or model checking. (I don’t know much about automated theorem proving, but do know a bit about model checking.)
It’s not a matter of “ahead”. Any research area is typically a bunch of separate tracks that proceed separately and eventually merge together or have interconnections. It might be several decades before the MIRI/self modifying AI track merges with the main line of AI or CS research. That’s not necessarily a sign anything is wrong. It took decades of improvement before formal verification or theorem proving become part of the computer science toolkit. I would consider MIRI a success if it follows a similar trajectory.
I can’t imagine any really credible assurance that “this basic research is definitely useful,” for any basic research. The ultimate goal “safe self modifying AI” is too remote to have any idea if we’re on the right track. But if MIRI, motivated by that goal, does neat stuff, I think it’s a safe bet that (A) the people doing the work are clueful, and (B) their work was at least potentially useful in dealing with AI risks. And potentially useful is the best assurance anybody can ever give.
I’m a computer systems guy, not a theorist or AI researcher, but my opinion of MIRI has gradually shifted from “slightly crankish” to “there are some interesting questions here and MIRI might be doing useful work on them that nobody else is currently doing.” My impression is a number of mainstream computer scientists have similar views.
Eliezer recently gave a talk at MIT. If the audience threw food at the stage, I would consider that evidence for MIRI being crankish. If knowledgeable CS types showed up and were receptive or interested, I would consider that a strong vote of confidence. Anybody able to comment?
It’s definitely not a goal of mainstream AI, and not even a goal of most AGI researchers, to create self-modifying AI that provably preserves its goals. MIRI’s work on this topic doesn’t seem relevant to what mainstream AI researchers want to achieve.
Zooming out from MIRI’s technical work to MIRI’s general mission, it’s certainly true that MIRI’s failure to convince the AI world of the importance of preventing unFriendly AI is Bayesian evidence against MIRI’s perspective being correct. Personally, I don’t find this evidence strong enough to make me think that preventing unFriendly AI isn’t worth working on.
Also, two more points why MIRI isn’t that likely to produce research AI researchers will see as a direct boon to their field: One, stuff that’s close to something people are already trying to do is more often already worked on; the stuff that people aren’t working on seem more important for MIRI to work on. And two, AGI researchers in particular are particularly interested in results that get us closer to AGI, and MIRI is trying to work on topics that can be published about without bringing the world closer to AGI.
I wouldn’t say MIRI has tried very hard yet to communicate about the Lobian obstacle to people in the relevant fields. E.g. we haven’t published about the Lobian obstacle in a journal or conference proceedings.
Part of the reason for that is that we don’t expect experts in these fields to be very interested in it. The Lobian obstacle is aiming at better understanding of strongly self-modifying systems, which won’t exist for at least 15 years, and probably longer than that.
I agree the AI community won’t be very interested. But it might be worth sending it to some theoretical computer science venue—STOC, say—instead. If nothing else, it would give useful information about how receptive academics are to the topic.