When analyzing complex systems (such as deep networks), it is tempting to separate the system into events or components (“parts”), analyze those parts separately, and combine results or “divide and conquer.”
This approach often wrongly assumes (Leveson 2020):
Separation does not distort the system’s properties
Each part operates independently
Part acts the same when examined singly as when acting in the whole
Parts are not subject to feedback loops and nonlinear interactions
Interactions between parts can be examined pairwise
Searching for mechanisms and reductionist analysis is too simplistic when dealing with complex systems (see our third post for more).
People hardly understand complex systems. Grad students in ML don’t even understand various aspects of their field, how to make a difference in it, what trends are emerging, or even what’s going on outside their small area. How will we understand an intelligence that moves more quickly and has more breadth? The reach of a human mind has limits. Perhaps a person could understand a small aspect of an agent’s actions (or components), but it’d be committing the composition fallacy to suggest a group of people that individually understand a part of an agent could understand the whole agent.
Also note that (e.g.) Dan H.[1] has also advocated for some version of this take. See for instance the Open Problems in AI X-Risk (Pragmatic AI safety #5) section on criticisms of transparency:
I think Dan is the source of this take in the post I link rather than the other author Thomas W, but not super confident.