But the capabilities of neural networks are currently advancing much faster than our ability to understand how they work or interpret their cognition;
Naively, you might think that as opacity increases, trust in systems decreases, and hence something like “willingness to deploy” decreases.
How good of an argument does this seem to you against the hypothesis that “capabilities will grow faster than alignment”? I’m viewing the quoted sentence as an argument for the hypothesis.
Some initial thoughts:
A highly capable system doesn’t necessarily need to be deployed by humans to disempower humans, meaning “deployment” is not necessarily a good concept to use here
On the other hand, deployability of systems increases investment in AI (how much?), meaning that increasing opacity might in some sense decreases future capabilities compared to counterfactuals where the AI was less opaque
I don’t know how much willingness to deploy really decreases from increased opacity, if at all
Opacity can be thought of as the inability to predict behavior in a given new environment. As models have scaled, the number of benchmarks we test them on also seems to have scaled, which does help us understand their behavior. So perhaps the measure that’s actually important is the “difference between tested behavior and deployed behavior” and it’s unclear to me what this metric looks like over time. [ETA: it feels obvious that our understanding of AI’s deployed behavior has worsened, but I want to be more specific and sure about that]
Naively, you might think that as opacity increases, trust in systems decreases, and hence something like “willingness to deploy” decreases.
How good of an argument does this seem to you against the hypothesis that “capabilities will grow faster than alignment”? I’m viewing the quoted sentence as an argument for the hypothesis.
Some initial thoughts:
A highly capable system doesn’t necessarily need to be deployed by humans to disempower humans, meaning “deployment” is not necessarily a good concept to use here
On the other hand, deployability of systems increases investment in AI (how much?), meaning that increasing opacity might in some sense decreases future capabilities compared to counterfactuals where the AI was less opaque
I don’t know how much willingness to deploy really decreases from increased opacity, if at all
Opacity can be thought of as the inability to predict behavior in a given new environment. As models have scaled, the number of benchmarks we test them on also seems to have scaled, which does help us understand their behavior. So perhaps the measure that’s actually important is the “difference between tested behavior and deployed behavior” and it’s unclear to me what this metric looks like over time. [ETA: it feels obvious that our understanding of AI’s deployed behavior has worsened, but I want to be more specific and sure about that]