Good post. I’m broadly supportive of MIRI’s goal of “deconfusion” and I like the theoretical emphasis of their research angle.
To help out, I’ll suggest a specific way in which it seems to me that MIRI is causing themselves unnecessarily confusion when thinking about these problems. From the article:
I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize for X” to actually result in a system that internally optimizes for X, but there are still wide swaths of the question where I can’t say much without saying nonsense.
In the mainstream machine learning community, the word “optimization” is almost always used in the mathematical sense: discovering a local or global optimum of a function, e.g. a continuous function of multiple variables. In contrast, MIRI uses “optimization” in two ways: sometimes in this mathematical sense, but sometimes in the sense of an agent optimizing its environment to match some set of preferences. Although these two operations share some connotational similarities, I don’t think they actually have much in common—it seems like the algorithms we’ve discovered to perform these two activities are often pretty different, and the “grammatical structure”/”type signature” of the two problems certainly seem quite different. Robin Hanson has even speculated that the right brain does something more like the first kind of optimization and the left brain does something more like the second.
Good post. I’m broadly supportive of MIRI’s goal of “deconfusion” and I like the theoretical emphasis of their research angle.
To help out, I’ll suggest a specific way in which it seems to me that MIRI is causing themselves unnecessarily confusion when thinking about these problems. From the article:
In the mainstream machine learning community, the word “optimization” is almost always used in the mathematical sense: discovering a local or global optimum of a function, e.g. a continuous function of multiple variables. In contrast, MIRI uses “optimization” in two ways: sometimes in this mathematical sense, but sometimes in the sense of an agent optimizing its environment to match some set of preferences. Although these two operations share some connotational similarities, I don’t think they actually have much in common—it seems like the algorithms we’ve discovered to perform these two activities are often pretty different, and the “grammatical structure”/”type signature” of the two problems certainly seem quite different. Robin Hanson has even speculated that the right brain does something more like the first kind of optimization and the left brain does something more like the second.