Deep learning can be thought of as an instance of _search_, in which we design an artifact (machine) simply by looking for an artifact that scores well on some evaluation metric. This is unlike typical engineering, which we might call _design_, in which we build the artifact in such a way that we can also understand it. This is the process that underlies the vast majority of artifacts in the world. This post seeks to understand design better, such that we could design powerful AI systems rather than having to find them using search.
The post argues that design functions by constructing an artifact along with a _story_ for why the artifact works, that abstracts away irrelevant details. For example, when working with a database, we talk of adding a “row” to a “table”: the abstraction of rows and tables forms a story that allows us to easily understand and use the database.
A typical design process for complex artifacts iterates between _construction_ of the artifact and _factorization_ which creates a story for the artifact. The goal is to end up with a useful artifact along with a simple and accurate story for it. A story is simple if it can be easily understood by humans, and accurate if humans using the story to reason about the artifact do not get surprised or harmed by the artifact.
You might think that we can get this for search-based artifacts using interpretability. However, most interpretability methods are either producing the story after the artifact is constructed (meaning that the construction does not optimize for simple and accurate stories), or are producing artifacts simple enough that they do not need a story. This is insufficient for powerful, complex artifacts.
As a result, we would like to use design for our artifacts rather than search. One alternative approach is to have humans design intelligent systems (the approach taken by MIRI). The post suggests another: automating the process of design, so that we automate both construction and factorization, rather than just construction (as done in search).
Planned opinion:
I liked the more detailed description of what is meant by “design”, and the broad story given for design seems roughly right, though obscuring details. I somewhat felt like the proposed solution of automating design seems pretty similar to existing proposals for human-in-the-loop AI systems: typically in such systems we are using the human to provide information about what we want and to verify that things are going as we expect, and it seems like a pretty natural way that this would happen would be via the AI system producing a story that the human can verify.
Planned summary for the Alignment Newsletter:
Planned opinion:
I think this is a very good summary
Thanks :)