So, I predict with high confidence that any ML model that can reach the perplexity levels of Transformers will also present great initial interpretive difficulty.
I do agree that any realistic ML model that achieves GPT-4-level perplexity would probably have to have at least some parts that are hard to interpret. However, I believe it should (in principle) be possible to build ML systems that have highly interpretable policies (or analogues thereof), despite having hard-to-interpret models.
I think if our goal was to build understandable/controllable/safe AI, then it would make sense to factor the AI’s mind into various “parts”, such as e.g. a policy, a set of models, and a (set of sub-)goals.
In contrast, implementing AIs as giant Transformers precludes making architectural distinctions between any such “parts”; the whole AI is in a(n architectural) sense one big uniform soup. Giant Transformers don’t even have the level of modularity of biological brains designed by evolution.
Consequently, I still think the “giant inscrutable tensors”-approach to building AIs is terrible from a safety perspective, not only in an absolute sense, but also in a relative sense (relative to saner approaches that I can see).
I do agree that any realistic ML model that achieves GPT-4-level perplexity would probably have to have at least some parts that are hard to interpret. However, I believe it should (in principle) be possible to build ML systems that have highly interpretable policies (or analogues thereof), despite having hard-to-interpret models.
I think if our goal was to build understandable/controllable/safe AI, then it would make sense to factor the AI’s mind into various “parts”, such as e.g. a policy, a set of models, and a (set of sub-)goals.
In contrast, implementing AIs as giant Transformers precludes making architectural distinctions between any such “parts”; the whole AI is in a(n architectural) sense one big uniform soup. Giant Transformers don’t even have the level of modularity of biological brains designed by evolution.
Consequently, I still think the “giant inscrutable tensors”-approach to building AIs is terrible from a safety perspective, not only in an absolute sense, but also in a relative sense (relative to saner approaches that I can see).