if they are to be represented as natural language, then an extremely central subproblem is how to check that the net-concepts-translated-into-natural-language actually robustly match what a human interprets that natural language to mean.
I posit that this can be operationalized as mapping between brain and LM representations of language (e.g. LM-fMRI encoding/decoding) and will note that, in my understanding, (some) neuroscientists seem quite confident that they know how to (quite robustly) figure out which parts of the brain do the language processing in ~ any individual. There’s an entire recent scientific literature looking at mapping between brain and LM representations of language, see e.g. some of the linkposts here: https://www.lesswrong.com/posts/wFZfnuB38cKJLgCrD/linkpost-mapping-brains-with-language-models-a-survey, https://www.lesswrong.com/posts/6azamabzKT3YnZW2E/linkpost-the-neuroconnectionist-research-programme, https://www.lesswrong.com/posts/iXbPe9EAxScuimsGh/linkpost-scaling-laws-for-language-encoding-models-in-fmri. The above claim is kind of implicit in a lot of the above literature, but also explicit in some papers, e.g. in Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain.
I posit that this can be operationalized as mapping between brain and LM representations of language (e.g. LM-fMRI encoding/decoding) and will note that, in my understanding, (some) neuroscientists seem quite confident that they know how to (quite robustly) figure out which parts of the brain do the language processing in ~ any individual. There’s an entire recent scientific literature looking at mapping between brain and LM representations of language, see e.g. some of the linkposts here: https://www.lesswrong.com/posts/wFZfnuB38cKJLgCrD/linkpost-mapping-brains-with-language-models-a-survey, https://www.lesswrong.com/posts/6azamabzKT3YnZW2E/linkpost-the-neuroconnectionist-research-programme, https://www.lesswrong.com/posts/iXbPe9EAxScuimsGh/linkpost-scaling-laws-for-language-encoding-models-in-fmri. The above claim is kind of implicit in a lot of the above literature, but also explicit in some papers, e.g. in Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain.