The first thing to say about this, is that it is a conceptual minefield. The semantics (or ontological grounding) of AI systems is, in my opinion, one of the least-well developed parts of the whole field. People often pay lip-service to some kind of model-theoretical justification for an AI’s semantic foundations, but in practice this actually means very little, since the theoretical ideas shade off into philosophy, have some huge unresolved gaps in them, and frequently take recourse in infinitely large (i.e. uncomputable) mappings between sets of ‘possible worlds’. Worst of all, the area is rife with question-begging (like using technical vocabulary which itself has a poorly defined semantics to try to specify exactly what ‘semantics’ is!).
Why does that matter? Because many of the statements that people make about semantic issues (like the alien semantics problem) are predicated on precisely which semantic theory they subscribe to. And, it is usually the case that their chosen semantic theory is just a vague idea that goes somewhat in the direction of Tarski, or in the direction of Montague, or maybe just what they read in Russell and Norvig. The problem is that those semantic theories have challengers (some of them not very well defined, but even so...), such as Cognitive Semantics, and those other semantic formalisms have a truly gigantic impact on some of the issues we are discussing here.
So, for example, there is an interpretation of semantics that says that it is not even coherent to talk about two concept landscapes that are semantic aliens. To be sure, this can happen in language—things expressible in one language can be very hard to say in another language—but the idea that two concept spaces can be in some way irreconcilable, or untranslatable, would be incoherent (not “unlikely” but actually not possible).
Ah! Finally a tasty piece of real discussion! I’ve got a biiig question about this: how do these various semantic theories for AI/AGI take into account the statistical nature of real cognition?
(Also, I’m kicking myself for not finishing Plato’s Camera yet, because now I’m desperately wanting to reference it.)
Basically: in real cognition, semantics are gained from the statistical relationship between a model of some sort, and feature data. There can be multiple “data-types” of feature data: one of the prominent features of the human brain is that once a concept is learned, it becomes more than a sum of training data, but instead a map of a purely abstract, high-dimensional feature space (or, if you prefer, a distribution over that feature space), with the emphasis being on the word abstract. The dimensions of that space are usually not feature data, but parameters of an abstract causal model, inferrable from feature data. This makes our real concepts accessible through completely different sensory modalities.
Given all this knowledge about how the real brain works, and given that we definitively need AGI/FAI/whatever to work at least as well if not, preferably, better than the real human brain… how do semantic theories in the AI/AGI field fit in with all this statistics? How do you turn statistics into model-theoretic semantics of a formal logic system?
Ack, I wish people didn’t ask such infernally good questions, so much! ;-)
Your question is good, but the answer is not really going to satisfy. There is an entire book on this subject, detailing the relationship between purely abstract linguistics-oriented theories of semantics, the more abstractly mathematical theories of semantics, the philosophical approach (which isn’t called “semantics” of course: that is epistemology), and the various (rather weak and hand-wavy) ideas that float around in AI. One thing it makes a big deal of is the old (but still alive) chestnut of the Grounding Problem.
The book pulls all of these things together and analyzes them in the context of the semantics that is actually used by the only real thinking systems on the planet right now (at least, the only ones that want to talk about semantics), and then it derives conclusions and recommendations for how all of that can be made to knit together.
Yup, you’ve guessed it.
That book doesn’t exist. There is not (in my opinion anyway) anything that even remotely comes close to it.
What you said about the statistical nature of real cognition would be considered, in cognitive psychology, as just one persepective on the issue: alas there are many.
At this point in time I can only say that in my despair at the hugeness of this issue leaves me with nothing much more to say except that I am trying to write that book, but I might never get around to it. And in the mean time I can only try, for my part, to write some answers to more specific questions within that larger whole.
How do the statistically-oriented theories of pragmatics and the linguistic theories of semantics go together?
Math semantics, in the denotational and operational senses, I kinda understand: you demonstrate the semantics of a mathematical system by providing some outside mathematical object which models it. This also works for CS semantics, but does come with the notion that we include \Bot as an element of our denotational domains and that our semantics may bottom out in “the machine does things”, ie: translation to opcodes.
The philosophical approach seems to wave words around like they’re not talking about how to make words mean things, or go reference the mathematical approach. I again wish to reference Plato’s Camera, and go with Domain Portrayal Semantics. That at least gives us a good guess to talk about how and why symbol grounding makes sense, as a feature of cognition that must necessarily happen in order for a mind to work.
What you said about the statistical nature of real cognition would be considered, in cognitive psychology, as just one persepective on the issue: alas there are many.
Nonetheless, it is considered one of the better-supported hypotheses in cognitive science and theoretical neuroscience.
At this point in time I can only say that in my despair at the hugeness of this issue leaves me with nothing much more to say except that I am trying to write that book, but I might never get around to it. And in the mean time I can only try, for my part, to write some answers to more specific questions within that larger whole.
There are really two aspects to semantics: grounding and compositionality. Elementary distinction, of course, but with some hidden subtlety to it … because many texts focus on one of them and do a quick wave of the hand at the other (it is usually the grounding aspect that gets short shrift, while the compositionality aspect takes center stage).
[Quick review for those who might need it: grounding is the question of how (among other things) the basic terms of your language or concept-encoding system map onto “things in the world”, whereas compositionality is how it is that combinations of basic terms/concepts can ‘mean’ something in such a way that the meaning of a combination can be derived from the meaning of the constituents plus the arrangement of the constituents.]
So, having said that, a few observations.
Denotational and operational semantics of programming languages or formal systems ….. well, there we have a bit of a closed universe, no? And things get awfully (deceptively) easy when we drop down into closed universes. (As Winograd and the other Blocks Worlds enthusiasts realized rather quickly). You hinted at that with your comment when you said:
… and that our semantics may bottom out in “the machine does things”, ie: translation to opcodes.
We can then jump straight from too simple to ridiculously abstract, finding ourselves listening to philosophical explanations of semantics, on which subject you said:
The philosophical approach seems to wave words around like they’re not talking about how to make words mean things...
Concisely put, and I am not sure I disagree (too much, at any rate).
Then we can jump sideways to psychology (and I will lump neuroscientists/neurophilosophers like Patricia Churchland in with the psychologists). I haven’t read any of PC’s stuff for quite a while, but Plato’s Camera does look to be above-average quality so I might give it a try. However, looking at the link you supplied I was able to grok where she was coming from with Domain Portrayal Semantics, and I have to say that there are some problems with that. (She may deal with the problems later, I don’t know, say that the following as provisional.)
Her idea of a Domain Portrayal Semantics is very static: just a state-space divide-and-conquer, really. The problem with that is that in real psychological contexts people often regard concepts as totally malleable in all sorts of ways. They shift the boundaries around over time, in different contexts, and with different attitudes. So, for example, I can take you into my workshop which is undergoing renovation at the moment and, holding in my hand a takeout meal for you and the other visitors, I can say “find some chairs, a lamp, and a dining table”. There are zero chairs, lamps and dining tables in the room. But, faced with the takeout that is getting cold, you look around and find (a) a railing sticking out of the wall, which becomes a chair because you can kinda sit on it, (b) a blowtorch that can supply light, and (c) a tea chest with a pile of stuff on it, from which the stuff can be removed to make a dining table. All of those things can be justifiably called chairs tables and lamps because of their functionality.
I am sure her idea could be extended to allow for this kind of malleability, but the bottom line is that you then build your semantics on some very shifty sort of sand, not the rock that maybe everyone was hoping for.
(I have to cut off this reply to go do a task. Hopefully get back to it later).
Plato’s Camera is well above average for a philosophy-of-mind book, but I still think it focuses too thoroughly on relatively old knowledge about what we can do with artificial neural networks, both supervised and unsupervised. My Kindle copy includes angry notes to the effect of, “If you claim we can do linear transformations on vector-space ‘maps’ to check by finding a homomorphism when they portray the same objective feature-domain, how the hell can you handle Turing-complete domains!? The equivalence of lambda expressions is undecidable!”
This is why I’m very much a fan of the probabilistic programming approach to computational cognitive science, which clears up these kinds of issues. In a probabilistic programming setting, the probability of extensional equality for two models (where models are distributions over computation traces) is a dead simple and utterly normal query: it’s just p(X == Y), where X and Y are taken to be models (aka: thunk lambdas, aka: distributions from which we can sample). The undecidable question is thus shunted aside in favor of a check that is merely computationally intensive, but can ultimately be done in a bounded-rational way.
My reaction to those simple neural-net accounts of cognition is similar, in that I wanted very much to overcome their (pretty glaring) limitations. I wasn’t so much concerned with inability to handle Turing complete domains, as other more practical issues. But I came to a different conclusion about the value of probabilistic programming approaches, because that seems to force the real world to conform to the idealized world of a branch of mathematics, and, like Leonardo, I don’t like telling Nature what she should be doing with her designs. ;-)
Under the heading of ‘interesting history’ it might be worth mentioning that I hit my first frustration with neural nets at the very time that it was bursting into full bloom—I was part of the revolution that shook cognitive science in the mid to late 1980s. Even while it was in full swing, I was already going beyond it. And I have continued on that path ever since. Tragically, the bulk of NN researchers stayed loyal to the very simplistic systems invented in the first blush of that spring, and never seemed to really understand that they had boxed themselves into a dead end.
But I came to a different conclusion about the value of probabilistic programming approaches, because that seems to force the real world to conform to the idealized world of a branch of mathematics, and, like Leonardo, I don’t like telling Nature what she should be doing with her designs. ;-)
And I have continued on that path ever since. Tragically, the bulk of NN researchers stayed loyal to the very simplistic systems invented in the first blush of that spring, and never seemed to really understand that they had boxed themselves into a dead end.
Could you explain the kinds of neural networks beyond the standard feedforward, convolutional, and recurrent supervised networks? In particular, I’d really appreciating hearing a connectionist’s view on how unsupervised neural networks can learn to convert low-level sensory features into the kind of more abstracted, “objectified” (in the sense of “made objective”) features that can be used for the bottom, most concrete layer of causal modelling.
Ah, but Nature’s elegant design for an embodied creature is precisely a bounded-Bayesian reasoner! You just minimize the free energy of the environment.
Yikes! No. :-)
That paper couldn’t be a more perfect example of what I meant when I said
that seems to force the real world to conform to the idealized world of a branch of mathematics
In other words, the paper talks about a theoretical entity which is a descriptive model (not a functional model) of one aspect of human decision making behavior. That means you cannot jump to the conclusion that this is “nature’s design for an embodide creature”.
About your second question. I can only give you an overview, but the essential ingredient is that to go beyond the standard neural nets you need to consider neuron-like objects that are actually free to be created and destroyed like processes on a network, and which interact with one another using more elaborate, generalized versions of the rules that govern simple nets.
From there it is easy to get to unsupervised concept building because the spontaneous activity of these atoms (my preferred term) involves searching for minimum-energy* configurations that describe the world.
There is actually more than one type of ‘energy’ being simultaneously minimized in the systems I work on.
You can read a few more hints of this stuff in my 2010 paper with Trevor Harley (which is actually on a different topic, but I threw in a sketch of the cognitive system for purposes of illustrating my point in that paper).
Reference:
Loosemore, R.P.W. & Harley, T.A. (2010). Brains and Minds: On the Usefulness of Localisation Data to Cognitive Psychology. In M. Bunzl & S.J. Hanson (Eds.), Foundational Issues of Neuroimaging. Cambridge, MA: MIT Press. http://richardloosemore.com/docs/2010a_BrainImaging_rpwl_tah.pdf
Ah! Finally a tasty piece of real discussion! I’ve got a biiig question about this: how do these various semantic theories for AI/AGI take into account the statistical nature of real cognition?
(Also, I’m kicking myself for not finishing Plato’s Camera yet, because now I’m desperately wanting to reference it.)
Basically: in real cognition, semantics are gained from the statistical relationship between a model of some sort, and feature data. There can be multiple “data-types” of feature data: one of the prominent features of the human brain is that once a concept is learned, it becomes more than a sum of training data, but instead a map of a purely abstract, high-dimensional feature space (or, if you prefer, a distribution over that feature space), with the emphasis being on the word abstract. The dimensions of that space are usually not feature data, but parameters of an abstract causal model, inferrable from feature data. This makes our real concepts accessible through completely different sensory modalities.
Given all this knowledge about how the real brain works, and given that we definitively need AGI/FAI/whatever to work at least as well if not, preferably, better than the real human brain… how do semantic theories in the AI/AGI field fit in with all this statistics? How do you turn statistics into model-theoretic semantics of a formal logic system?
Ack, I wish people didn’t ask such infernally good questions, so much! ;-)
Your question is good, but the answer is not really going to satisfy. There is an entire book on this subject, detailing the relationship between purely abstract linguistics-oriented theories of semantics, the more abstractly mathematical theories of semantics, the philosophical approach (which isn’t called “semantics” of course: that is epistemology), and the various (rather weak and hand-wavy) ideas that float around in AI. One thing it makes a big deal of is the old (but still alive) chestnut of the Grounding Problem.
The book pulls all of these things together and analyzes them in the context of the semantics that is actually used by the only real thinking systems on the planet right now (at least, the only ones that want to talk about semantics), and then it derives conclusions and recommendations for how all of that can be made to knit together.
Yup, you’ve guessed it.
That book doesn’t exist. There is not (in my opinion anyway) anything that even remotely comes close to it.
What you said about the statistical nature of real cognition would be considered, in cognitive psychology, as just one persepective on the issue: alas there are many.
At this point in time I can only say that in my despair at the hugeness of this issue leaves me with nothing much more to say except that I am trying to write that book, but I might never get around to it. And in the mean time I can only try, for my part, to write some answers to more specific questions within that larger whole.
Ok, let me continue to ask questions.
How do the statistically-oriented theories of pragmatics and the linguistic theories of semantics go together?
Math semantics, in the denotational and operational senses, I kinda understand: you demonstrate the semantics of a mathematical system by providing some outside mathematical object which models it. This also works for CS semantics, but does come with the notion that we include
\Bot
as an element of our denotational domains and that our semantics may bottom out in “the machine does things”, ie: translation to opcodes.The philosophical approach seems to wave words around like they’re not talking about how to make words mean things, or go reference the mathematical approach. I again wish to reference Plato’s Camera, and go with Domain Portrayal Semantics. That at least gives us a good guess to talk about how and why symbol grounding makes sense, as a feature of cognition that must necessarily happen in order for a mind to work.
Nonetheless, it is considered one of the better-supported hypotheses in cognitive science and theoretical neuroscience.
Fair enough.
There are really two aspects to semantics: grounding and compositionality. Elementary distinction, of course, but with some hidden subtlety to it … because many texts focus on one of them and do a quick wave of the hand at the other (it is usually the grounding aspect that gets short shrift, while the compositionality aspect takes center stage).
[Quick review for those who might need it: grounding is the question of how (among other things) the basic terms of your language or concept-encoding system map onto “things in the world”, whereas compositionality is how it is that combinations of basic terms/concepts can ‘mean’ something in such a way that the meaning of a combination can be derived from the meaning of the constituents plus the arrangement of the constituents.]
So, having said that, a few observations.
Denotational and operational semantics of programming languages or formal systems ….. well, there we have a bit of a closed universe, no? And things get awfully (deceptively) easy when we drop down into closed universes. (As Winograd and the other Blocks Worlds enthusiasts realized rather quickly). You hinted at that with your comment when you said:
We can then jump straight from too simple to ridiculously abstract, finding ourselves listening to philosophical explanations of semantics, on which subject you said:
Concisely put, and I am not sure I disagree (too much, at any rate).
Then we can jump sideways to psychology (and I will lump neuroscientists/neurophilosophers like Patricia Churchland in with the psychologists). I haven’t read any of PC’s stuff for quite a while, but Plato’s Camera does look to be above-average quality so I might give it a try. However, looking at the link you supplied I was able to grok where she was coming from with Domain Portrayal Semantics, and I have to say that there are some problems with that. (She may deal with the problems later, I don’t know, say that the following as provisional.)
Her idea of a Domain Portrayal Semantics is very static: just a state-space divide-and-conquer, really. The problem with that is that in real psychological contexts people often regard concepts as totally malleable in all sorts of ways. They shift the boundaries around over time, in different contexts, and with different attitudes. So, for example, I can take you into my workshop which is undergoing renovation at the moment and, holding in my hand a takeout meal for you and the other visitors, I can say “find some chairs, a lamp, and a dining table”. There are zero chairs, lamps and dining tables in the room. But, faced with the takeout that is getting cold, you look around and find (a) a railing sticking out of the wall, which becomes a chair because you can kinda sit on it, (b) a blowtorch that can supply light, and (c) a tea chest with a pile of stuff on it, from which the stuff can be removed to make a dining table. All of those things can be justifiably called chairs tables and lamps because of their functionality.
I am sure her idea could be extended to allow for this kind of malleability, but the bottom line is that you then build your semantics on some very shifty sort of sand, not the rock that maybe everyone was hoping for.
(I have to cut off this reply to go do a task. Hopefully get back to it later).
Plato’s Camera is well above average for a philosophy-of-mind book, but I still think it focuses too thoroughly on relatively old knowledge about what we can do with artificial neural networks, both supervised and unsupervised. My Kindle copy includes angry notes to the effect of, “If you claim we can do linear transformations on vector-space ‘maps’ to check by finding a homomorphism when they portray the same objective feature-domain, how the hell can you handle Turing-complete domains!? The equivalence of lambda expressions is undecidable!”
This is why I’m very much a fan of the probabilistic programming approach to computational cognitive science, which clears up these kinds of issues. In a probabilistic programming setting, the probability of extensional equality for two models (where models are distributions over computation traces) is a dead simple and utterly normal query: it’s just
p(X == Y)
, where X and Y are taken to be models (aka: thunk lambdas, aka: distributions from which we can sample). The undecidable question is thus shunted aside in favor of a check that is merely computationally intensive, but can ultimately be done in a bounded-rational way.My reaction to those simple neural-net accounts of cognition is similar, in that I wanted very much to overcome their (pretty glaring) limitations. I wasn’t so much concerned with inability to handle Turing complete domains, as other more practical issues. But I came to a different conclusion about the value of probabilistic programming approaches, because that seems to force the real world to conform to the idealized world of a branch of mathematics, and, like Leonardo, I don’t like telling Nature what she should be doing with her designs. ;-)
Under the heading of ‘interesting history’ it might be worth mentioning that I hit my first frustration with neural nets at the very time that it was bursting into full bloom—I was part of the revolution that shook cognitive science in the mid to late 1980s. Even while it was in full swing, I was already going beyond it. And I have continued on that path ever since. Tragically, the bulk of NN researchers stayed loyal to the very simplistic systems invented in the first blush of that spring, and never seemed to really understand that they had boxed themselves into a dead end.
Ah, but Nature’s elegant design for an embodied creature is precisely a bounded-Bayesian reasoner! You just minimize the free energy of the environment.
Could you explain the kinds of neural networks beyond the standard feedforward, convolutional, and recurrent supervised networks? In particular, I’d really appreciating hearing a connectionist’s view on how unsupervised neural networks can learn to convert low-level sensory features into the kind of more abstracted, “objectified” (in the sense of “made objective”) features that can be used for the bottom, most concrete layer of causal modelling.
Yikes! No. :-)
That paper couldn’t be a more perfect example of what I meant when I said
In other words, the paper talks about a theoretical entity which is a descriptive model (not a functional model) of one aspect of human decision making behavior. That means you cannot jump to the conclusion that this is “nature’s design for an embodide creature”.
About your second question. I can only give you an overview, but the essential ingredient is that to go beyond the standard neural nets you need to consider neuron-like objects that are actually free to be created and destroyed like processes on a network, and which interact with one another using more elaborate, generalized versions of the rules that govern simple nets.
From there it is easy to get to unsupervised concept building because the spontaneous activity of these atoms (my preferred term) involves searching for minimum-energy* configurations that describe the world.
There is actually more than one type of ‘energy’ being simultaneously minimized in the systems I work on.
You can read a few more hints of this stuff in my 2010 paper with Trevor Harley (which is actually on a different topic, but I threw in a sketch of the cognitive system for purposes of illustrating my point in that paper).
Reference: Loosemore, R.P.W. & Harley, T.A. (2010). Brains and Minds: On the Usefulness of Localisation Data to Cognitive Psychology. In M. Bunzl & S.J. Hanson (Eds.), Foundational Issues of Neuroimaging. Cambridge, MA: MIT Press. http://richardloosemore.com/docs/2010a_BrainImaging_rpwl_tah.pdf