Nate Soares has written up a post that discusses MIRI’s new research directions: a mix of reasons why we’re pursuing them, reasons why we’re pursuing them the way we are, and high-level comparisons to Agent Foundations.
Read the full (long!) post here.
I was recently thinking about focus. Some examples:
This tweet:
Sam Altman’s recent blogpost on How to Be Successful has the following two commands:
(He often talks about the main task a startup founder has is to pick the 2 or 3 things to focus on that day of the 100+ things vying for your attention.)
And I found this old quote by the mathematician Gronthendieck on Michael Nielson’s blog.
Overall, it made me update that MIRI’s decision to be closed-by-default is quite sensible. This section seems trivially correct from this point of view.
I think that closed-by-default is a very bad strategy from the perspective of outreach, and the perspective of building a field of AI alignment. But I realise that MIRI is explicitly and wholly focusing on making research progress, for at least the coming few years, and I think overall the whole post and decisions make a lot of sense from this perspective.
And here’s the footnotes:
Edited: Added a key section at the end.
Another interesting idea for discussion, is the value of making a long-term commitment to keeping research within a contained environment (i.e. what the OP calls ‘nondisclosed-by-default’).
There’s a bunch of args. Many seem straightforward to me (early research doesn’t translate well into papers at all, it might accidentally turn out to move capabilities forward and you want to see it develop a while to be sure it won’t, etc) but this one surprised me more, and I’d be interested to know if it resonates/is-dissonant with others’ experiences.
Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don’t want to talk publicly about how to work to prevent things that we’re worried about because it could prompt bad actors to look into those things.
The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.
When it comes to disclosure policies, if I’m uncertain between the “MIRI view” and the “Paul Christiano” view, should I bite the bullet and back one approach over the other? Or can I aim to support both views, without worrying that they’re defeating each other?
My current understanding is that it’s coherent to support both at once. That is, I can think that possibly intelligence needs lots of fundamental insights, and that safety needs lots of similar insights (this is supposed to be a characterisation of a MIRI-ish view). I can think that work done on figuring out more about intelligence and how to control it should only be shared cautiously, because it may accelerate the creation of AGI.
I can also think that prosaic AGI is possible, and fundamental insights aren’t needed. Then I might think that I could do research that would help align prosaic AGIs but couldn’t possibly align (or contribute to) an agent-based AGI.
Is the above consistent? Also do people (with better emulators of people) who worry about disclosure think that this makes sense from their point of view?
OP is quite long, let me copy over some interesting sections to reduce trivial inconveniences to discussion. This section is especially interesting, trying to explicate the ‘deconfusion’ concept:
And footnote 7:
I’d be interested if anyone can add insight to the examples discussed in the footnote. I’m also curious if any further examples seem salient to people, or alternatively if this frame seems itself confused about how certain key types of insights come about.
Good post. I’m broadly supportive of MIRI’s goal of “deconfusion” and I like the theoretical emphasis of their research angle.
To help out, I’ll suggest a specific way in which it seems to me that MIRI is causing themselves unnecessarily confusion when thinking about these problems. From the article:
In the mainstream machine learning community, the word “optimization” is almost always used in the mathematical sense: discovering a local or global optimum of a function, e.g. a continuous function of multiple variables. In contrast, MIRI uses “optimization” in two ways: sometimes in this mathematical sense, but sometimes in the sense of an agent optimizing its environment to match some set of preferences. Although these two operations share some connotational similarities, I don’t think they actually have much in common—it seems like the algorithms we’ve discovered to perform these two activities are often pretty different, and the “grammatical structure”/”type signature” of the two problems certainly seem quite different. Robin Hanson has even speculated that the right brain does something more like the first kind of optimization and the left brain does something more like the second.
One group that isn’t considered in the analysis is new trainees. It seems that AGI is probably sufficiently far off, that many of the people who will make the breakthroughs are not yet researchers or experts. If you are a bright young person who might work at MIRI or somewhere similar in 5 years time, you would want to get familiar with the area. You are probably reading MIRI’s existing work, to see if you have the capability to work in the field. This means that if you do join MIRI, you have already been thinking along the right lines for years.
Obviously you don’t want your discussions live streamed to the world, you might come up with dangerous ideas. But I would suggest sticking things online once you understand the area sufficiently well to be confident its safe. If writing it up into a fully formal paper is too time intensive, any rough scraps will still be read by the dedicated.
Yup. As someone aiming to do their dissertation on issues of limited agency (low impact, mild optimization, corrigibility), it sure would be frustrating to essentially end up duplicating the insights that MIRI has on some new optimization paradigm.
I still understand why they’re doing this and think it’s possibly beneficial, but it would be nice to avoid having this happen.
From this post:
One could make an argument for multiple indepedent AI safety teams on similar grounds:
“any one optimization paradigm may have weaknesses, but it’s unlikely that multiple optimization paradigms will have weaknesses in the same place”
“any one team may not consider a particular fatal flaw, but it’s unlikely that multiple teams will all neglect the same fatal flaw”
In the best case, you can merge multiple paradigms & preserve the strengths of all with the weaknesses of none. In the worst case, having competing paradigms still gives you the opportunity to select the best one.
This works best if individual researchers/research teams are able to set aside their egos and overcome not-invented-here syndrome to create the best overall system… which is a big if.
Small moderation note: The linked post contains a bunch of organization updates plus a hiring pitch, which is off-topic for the frontpage. However, the post also contains a bunch of very important gems of theory and insight, and I wanted to make sure those can be discussed here. I think it’s better to keep discussion of logistical details and solicitations for donations off the frontpage, but I think it’s fine to discuss “MIRI’s epistemic state” in a similar fashion as you would discuss the epistemic state of a person on the frontpage.
Hm. I wonder what an “alternative” to neural nets and gradient descent would look like. Neural nets are really just there as a highly expressive model class that gradient descent works on.
One big difficulty is that if your model is going to classify pictures of cats (or go boards, etc.), it’s going to be pretty darn complicated, and I’m sceptical that any choice of model class is going to prevent that. But maybe one could try to “hide” this complexity in a recursive structure. Neural nets already do this, but convnets especially mix up spatial hierarchy with logical hierarchy, and nns in general aren’t as nicely packaged into human-thought-sized pieces as maybe they could be—consider resnets, which work well precisely because they abandon the pretense of each neuron being some specific human-scale logical unit.
So maybe you could go the opposite direction and make that pretense a reality with some kind of model class that tries to enforce “human-thought-sized” reused units with relatively sparse inter-unit connections? Could still train with SGD, or treat hypotheses as decision trees and take advantage of that literature.
But suppose we got such a model class working, and trained it to recognize cats. Would it actually be human-comprehensible? Probably not! I guess I’m just not really clear on what “designed for transparency and alignability” is supposed to cash out to at this stage of the game.