I think Alice’s & Bob’s brains have learning algorithms that turn sensory inputs into nice probabilistic generative models that predict and explain those sensory inputs. For example, if a kid sees a bunch of teddy bears, they’ll form a concept (latent variable) of teddy bears, even if they don’t yet know any word for it.
And then language gets associated with latent nodes in those generative models. E.g. I point to a teddy bear and say “that’s called a teddy bear”, and now the kid associates “teddy bear” with that preexisting latent variable, i.e. the teddy bear concept variable which was active in their minds while you were talking.
So I see basically two stages:
Stage 1: Sensory inputs → Probabilistic generative model with latent variables,
Stage 2: Latent variables ↔ Language
…where you seem to see just one stage, if I understand this post correctly.
And likewise, for the “rich enough… small enough… equivalence of variables across Bayesian agents…” problems, you seem to see it as a language problem, whereas I see it as mostly solved by Stage 1 before anyone even opens their mouth to speak a word. (“Rich enough” and “small enough” because the learning algorithm is really good at finding awesome latents and the inference algorithm is really good at activating them at contextually-appropriate times, and “equivalence” because all humans have almost the same learning algorithm and are by assumption in a similar environment.)
Also, I think Stage 1 (i.e. sensory input → generative model) is basically the hard part of AGI capabilities. (That’s why the probabilistic programming people usually have to put in the structure of their generative models by hand.) So I have strong misgivings about a call-to-arms encouraging people to sort that out.
(If you can solve Stage 1, then Stage 2 basically happens automatically as a special case, given that language is sensory input too.)
…where you seem to see just one stage, if I understand this post correctly.
Oh, I totally agree with your mental model here. It’s implicit in the clustering toy model, for example: the agents fit the clusters to some data (stage 1), and only after that can they match words to clusters with just a handful of examples of each word (stage 2).
In that frame, the overarching idea of the post is:
We’d like to characterize what the (convergent/interoperable) latents are.
… and because stage 2 exists, we can use language (and our use thereof) as one source of information about those latents, and work backwards. Working forwards through stage 1 isn’t the only option.
Also, I think Stage 1 (i.e. sensory input → generative model) is basically the hard part of AGI capabilities. [...] So I have strong misgivings about a call-to-arms encouraging people to sort that out.
Note that understanding what the relevant latents are does not necessarily imply the ability to learn them efficiently. Indeed, the toy models in the post are good examples: both involve recognizing “naturality” conditions over some stuff, but they’re pretty agnostic about the learning stage.
I admit that there’s potential for capability externalities here, but insofar as results look like more sophisticated versions of the toy models in the post, I expect this work to be multiple large steps away from application to capabilities.
I think I’m much more skeptical than you that the latents can really be well characterized in any other way besides saying that “the latents are whatever latents get produced by such-and-such learning algorithm as run by a human brain”. As examples:
the word “behind” implies a person viewing a scene from a certain perspective;
the word “contaminate” implies a person with preferences (more specifically, valence assessents);
the word “salient” implies a person with an attentional spotlight;
visual words (“edge”, “jagged”, “dappled”, etc.) depend on the visual-perception priors, i.e. the neural architecture involved in analyzing incoming visual data. For example, I think the fact that humans factor images into textures versus edges, unlike ImageNet-trained CNNs, is baked into the neural architecture of the visual cortex (see discussion of “blobs” & “interblobs” here);
“I’m feeling down” implies that the speaker and listener can invoke spatial analogies (cf Lakoff & Johnson).
the verb “climb” as in “climbing the corporate ladder”, and the insult “butterfingers”, imply that the speaker and listener can invoke more colorful situational analogies
the word “much” (e.g. “too much”, “so much”) tends to imply a background context of norms and propriety (see Hofstadter p67)
conjunctions like “and” and “but” tend to characterize patterns in the unfurling process of how the listener is parsing the incoming word stream, moment-by-moment through time (see Hofstadter p72)
Features like “edge” or “dappled” were IIRC among the first discoveries when people first started doing interp on CNNs back around 2016 or so. So they might be specific to a data modality (i.e. vision), but they’re not specific to the human brain’s learning algorithm.
“Behind” seems similar to “edge” and “dappled”, but at a higher level of abstraction; it’s something which might require a specific data modality but probably isn’t learning algorithm specific.
I buy your claim a lot more for value-loaded words, like “I’m feeling down”, the connotations of “contaminate”, and “much”. (Note that an alien mind might still reify human-value-loaded concepts in order to model humans, but that still probably involves modeling a lot of the human learning algorithm, so your point stands.)
I buy that “salient” implies an attentional spotlight, but I would guess that an attentional spotlight can be characterized without modeling the bulk of the human learning algorithm.
I buy that the semantics of “and” or “but” are pretty specific to humans’ language-structure, but I don’t actually care that much about the semantics of connectives like that. What I care about is the semantics of e.g. sentences containing “and” or “but”.
I definitely buy that analogies like “butterfingers” are a pretty large chunk of language in practice, and it sure seems hard to handle semantics of those without generally understanding analogy, and analogy sure seems like a big central piece of the human learning algorithm.
At the meta-level: I’ve been working on this natural abstraction business for four years now, and your list of examples in that comment is one of the most substantive and useful pieces of pushback I’ve gotten in that time. So the semantics frame is definitely proving useful!
One mini-project in this vein which would potentially be high-value would be for someone to go through a whole crapton of natural language examples and map out some guesses at which semantics would/wouldn’t be convergent across minds in our environment.
I think a big aspect of salience arises from dealing with commensurate variables that have a natural zero-point (e.g. physical size), because then one can rank the variables by their distance from zero, and the ones that are furthest from zero are inherently more salient. Attentional spotlights are also probably mainly useful in cases where the variables have high skewness so there are relevant places to put the spotlight.
I don’t expect this model to capture all of salience, but I expect it to capture a big chunk, and to be relevant in many other contexts too. E.g. an important aspect of “misleading” communication is to talk about the variables of smaller magnitude while staying silent about the variables of bigger magnitude.
For example, if I got attacked by a squirrel ten years ago, and it was a very traumatic experience for me, then the possibility-of-getting-attacked-by-a-squirrel will be very salient in my mind whenever I’m making decisions, even if it’s not salient to anyone else. (Squirrels are normally shy and harmless.)
In this case, under my model of salience as the biggest deviating variables, the variable I’d consider would be something like “likelihood of attacking”. It is salient to you in the presence of squirrels because all other things nearby (e.g. computers or trees) are (according to your probabilistic model) much less likely to attack, and because the risk of getting attacked by something is much more important than many other things (e.g. seeing something).
In a sense, there’s a subjectivity because different people might have different traumas, but this subjectivity isn’t such a big problem because there is a “correct” frequency with which squirrels attack under various conditions, and we’d expect the main disagreement with a superintelligence to be that it has a better estimate than we do.
A deeper subjectivity is that we care about whether we get attacked by squirrels, and we’re not powerful enough that it is completely trivial and ignorable whether squirrels attack us and our allies, so squirrel attacks are less likely to be of negligible magnitude relative to our activities.
I think Alice’s & Bob’s brains have learning algorithms that turn sensory inputs into nice probabilistic generative models that predict and explain those sensory inputs. For example, if a kid sees a bunch of teddy bears, they’ll form a concept (latent variable) of teddy bears, even if they don’t yet know any word for it.
And then language gets associated with latent nodes in those generative models. E.g. I point to a teddy bear and say “that’s called a teddy bear”, and now the kid associates “teddy bear” with that preexisting latent variable, i.e. the teddy bear concept variable which was active in their minds while you were talking.
So I see basically two stages:
Stage 1: Sensory inputs → Probabilistic generative model with latent variables,
Stage 2: Latent variables ↔ Language
…where you seem to see just one stage, if I understand this post correctly.
And likewise, for the “rich enough… small enough… equivalence of variables across Bayesian agents…” problems, you seem to see it as a language problem, whereas I see it as mostly solved by Stage 1 before anyone even opens their mouth to speak a word. (“Rich enough” and “small enough” because the learning algorithm is really good at finding awesome latents and the inference algorithm is really good at activating them at contextually-appropriate times, and “equivalence” because all humans have almost the same learning algorithm and are by assumption in a similar environment.)
Also, I think Stage 1 (i.e. sensory input → generative model) is basically the hard part of AGI capabilities. (That’s why the probabilistic programming people usually have to put in the structure of their generative models by hand.) So I have strong misgivings about a call-to-arms encouraging people to sort that out.
(If you can solve Stage 1, then Stage 2 basically happens automatically as a special case, given that language is sensory input too.)
Sorry if I’m misunderstanding.
Oh, I totally agree with your mental model here. It’s implicit in the clustering toy model, for example: the agents fit the clusters to some data (stage 1), and only after that can they match words to clusters with just a handful of examples of each word (stage 2).
In that frame, the overarching idea of the post is:
We’d like to characterize what the (convergent/interoperable) latents are.
… and because stage 2 exists, we can use language (and our use thereof) as one source of information about those latents, and work backwards. Working forwards through stage 1 isn’t the only option.
Note that understanding what the relevant latents are does not necessarily imply the ability to learn them efficiently. Indeed, the toy models in the post are good examples: both involve recognizing “naturality” conditions over some stuff, but they’re pretty agnostic about the learning stage.
I admit that there’s potential for capability externalities here, but insofar as results look like more sophisticated versions of the toy models in the post, I expect this work to be multiple large steps away from application to capabilities.
I think I’m much more skeptical than you that the latents can really be well characterized in any other way besides saying that “the latents are whatever latents get produced by such-and-such learning algorithm as run by a human brain”. As examples:
the word “behind” implies a person viewing a scene from a certain perspective;
the word “contaminate” implies a person with preferences (more specifically, valence assessents);
the word “salient” implies a person with an attentional spotlight;
visual words (“edge”, “jagged”, “dappled”, etc.) depend on the visual-perception priors, i.e. the neural architecture involved in analyzing incoming visual data. For example, I think the fact that humans factor images into textures versus edges, unlike ImageNet-trained CNNs, is baked into the neural architecture of the visual cortex (see discussion of “blobs” & “interblobs” here);
“I’m feeling down” implies that the speaker and listener can invoke spatial analogies (cf Lakoff & Johnson).
the verb “climb” as in “climbing the corporate ladder”, and the insult “butterfingers”, imply that the speaker and listener can invoke more colorful situational analogies
the word “much” (e.g. “too much”, “so much”) tends to imply a background context of norms and propriety (see Hofstadter p67)
conjunctions like “and” and “but” tend to characterize patterns in the unfurling process of how the listener is parsing the incoming word stream, moment-by-moment through time (see Hofstadter p72)
Great examples! I buy them to varying extents:
Features like “edge” or “dappled” were IIRC among the first discoveries when people first started doing interp on CNNs back around 2016 or so. So they might be specific to a data modality (i.e. vision), but they’re not specific to the human brain’s learning algorithm.
“Behind” seems similar to “edge” and “dappled”, but at a higher level of abstraction; it’s something which might require a specific data modality but probably isn’t learning algorithm specific.
I buy your claim a lot more for value-loaded words, like “I’m feeling down”, the connotations of “contaminate”, and “much”. (Note that an alien mind might still reify human-value-loaded concepts in order to model humans, but that still probably involves modeling a lot of the human learning algorithm, so your point stands.)
I buy that “salient” implies an attentional spotlight, but I would guess that an attentional spotlight can be characterized without modeling the bulk of the human learning algorithm.
I buy that the semantics of “and” or “but” are pretty specific to humans’ language-structure, but I don’t actually care that much about the semantics of connectives like that. What I care about is the semantics of e.g. sentences containing “and” or “but”.
I definitely buy that analogies like “butterfingers” are a pretty large chunk of language in practice, and it sure seems hard to handle semantics of those without generally understanding analogy, and analogy sure seems like a big central piece of the human learning algorithm.
At the meta-level: I’ve been working on this natural abstraction business for four years now, and your list of examples in that comment is one of the most substantive and useful pieces of pushback I’ve gotten in that time. So the semantics frame is definitely proving useful!
One mini-project in this vein which would potentially be high-value would be for someone to go through a whole crapton of natural language examples and map out some guesses at which semantics would/wouldn’t be convergent across minds in our environment.
I think a big aspect of salience arises from dealing with commensurate variables that have a natural zero-point (e.g. physical size), because then one can rank the variables by their distance from zero, and the ones that are furthest from zero are inherently more salient. Attentional spotlights are also probably mainly useful in cases where the variables have high skewness so there are relevant places to put the spotlight.
I don’t expect this model to capture all of salience, but I expect it to capture a big chunk, and to be relevant in many other contexts too. E.g. an important aspect of “misleading” communication is to talk about the variables of smaller magnitude while staying silent about the variables of bigger magnitude.
For example, if I got attacked by a squirrel ten years ago, and it was a very traumatic experience for me, then the possibility-of-getting-attacked-by-a-squirrel will be very salient in my mind whenever I’m making decisions, even if it’s not salient to anyone else. (Squirrels are normally shy and harmless.)
In this case, under my model of salience as the biggest deviating variables, the variable I’d consider would be something like “likelihood of attacking”. It is salient to you in the presence of squirrels because all other things nearby (e.g. computers or trees) are (according to your probabilistic model) much less likely to attack, and because the risk of getting attacked by something is much more important than many other things (e.g. seeing something).
In a sense, there’s a subjectivity because different people might have different traumas, but this subjectivity isn’t such a big problem because there is a “correct” frequency with which squirrels attack under various conditions, and we’d expect the main disagreement with a superintelligence to be that it has a better estimate than we do.
A deeper subjectivity is that we care about whether we get attacked by squirrels, and we’re not powerful enough that it is completely trivial and ignorable whether squirrels attack us and our allies, so squirrel attacks are less likely to be of negligible magnitude relative to our activities.