Thanks, I didn’t know this perspective on the history of our science. The stories I most heard were indeed more about HH model, Hebb rule, Kohonen map, RL, and then connexionism became deep learning..
If the object tends toward geometrical simplicity – she was using identification of visual objects as her domain – then a conventional, sequential, computational regime was most effective.
…but neural networks did refute that idea! I feel like I’m missing something here, especially since you then mention GPU. Was sequential a typo?
When I hear « conventional, sequential, computational regime », my understanding is « the way everyone was trying before parallel computation revolutionized computer vision ». What’s your definition so that using GPU feels sequential?
Oh, I didn’t mean to say imply that using GPUs was sequential, not at all. What I meant was that the connectionist alternative didn’t really take off until GPUs were used, making massive parallelism possible.
Going back to Yevick, in her 1975 paper she often refers to holographic logic as ‘one-shot’ logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory ‘surface’ is searched in one unitary operation.
In an LLM, I’m thinking of the generation of a single token as such a unitary or primitive process. That is to say, I think of the LLM as a “virtual machine” (I first saw the phrase in a blog post by Chris Olah) that is running an associative memory machine. Physically, yes, we’ve got a massive computation involving every parameter and (I’m assuming) there’s a combination of massive parallel and sequential operations taking place in the GPUs. Complete physical parallelism isn’t possible (yet). But there are no logical operations taking place in this virtual operation, no transfer of control. It’s one operation.
Obviously, though, considered as an associative memory device, an LLM is capable of much more than passive storage and retrieval. It performs analytic and synthetic operations over the memory based on the prompt, which is just a probe (‘reference beam’ in holographic terms) into an associative memory. We’ve got to understand how the memory is structured so that that is possible.
What I meant was that the connectionist alternative didn’t really take off until GPUs were used, making massive parallelism possible.
Thanks for the clarification! I guess you already noticed how research centers in cognitive science seem to have a failure mode over a specific value question: Do we seek excellence at the risk of overfitting funding agency criterion, or do we seek fidelity to our interdisciplinary mission at the risk of compromising growth?
I certainly agree that, before the GPUs, the connectionist approach had a very small share of the excellence tokens. But it was already instrumental in providing a common conceptual framework beyond cognitivism. As an example, even the first PCs were enough to run toy examples of double dissociation using networks structured by sensory type rather than by cognitive operation. From a neuropsychological point of view, that was already a key result. And for the neuroscientist in me, toy models like Kohonen maps were already key to make sense of why we need so many short inhibitory neurons in grid-like cortical structures.
Going back to Yevick, in her 1975 paper she often refers to holographic logic as ‘one-shot’ logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory ‘surface’ is searched in one unitary operation.
Like a refresh rate? That would fit the evidence for a 3-7 Hz refresh rate of our cartesian theater, or the way LLMs go through prompt/answer cycles. Do you see other potential uses for this concept?
We’ve got to understand how the memory is structured so that that is possible.
In a paper I wrote awhile back I cite the late Walter Freeman as arguing that “consciousness arises as discontinuous whole-hemisphere states succeeding one another at a “frame rate” of 6 Hz to 10 Hz” (p. 2). I’m willing to speculate that that’s your ‘one-shot’ refresh rate. BTW, Freeman didn’t believe in a Cartesian theater and neither do it; the imagery of the stage ‘up there’ and the seating area ‘back here’ is not at all helpful. We’re not talking about some specific location or space in the brain; we’re talking about a process.
Well, of course, “the distributed way.” But what is that? Prompt engineering is about maneuvering your way through the LLM; you’re attempting to manipulate the structure inherent in those weights to produce a specific result you want.
That 1978 comment of Yevick’s that I quote in that blog post I mentioned somewhere up there, was in response to an article by John Haugeland evaluating cognitivism. He wondered whether or not there was an alternative and suggested holography as a possibility. He didn’t make a very plausible case and few of the commentators took is as a serious alternative.
People were looking for alternatives. But it took awhile for connectionism to build up a record of interesting results, on the one hand, for cognitivism to begin seeming stale on the other hand. It’s the combination of the two that brought about significant intellectual change. Or that’s my speculation.
I’m willing to speculate that [6 Hz to 10 Hz ]that’s your ‘one-shot’ refresh rate.
It’s possible. I don’t think there was relevant human data in Walter Freeman time, so I’m willing to speculate that’s indeed the frame rate in mouse. But I didn’t check the literature he had access to, so just a wild guess.
the imagery of the stage ‘up there’ and the seating area ‘back here’ is not at all helpful
I agree there’s no seating area. I still find the concept of a cartesian theater useful. For exemple, it allows knowing where to plant electrodes if you want to access the visual cartesian theater for rehabilitation purposes. I guess you’d agree that can be helpful. 😉
We’re not talking about some specific location or space in the brain; we’re talking about a process.
I have friends who believe that, but they can’t explain why the brain needs that much ordering in the sensory areas. What’s your own take?
But what is [the distributed way]that?
You know backprop algorithm? That’s a mathematical model for the distributed way. It was recently shown that it produces networks that explains (statistically speaking) most the properties of the BOLD cortical response in our visial systems. So, whatever the biological cortices actually do, it turns equivalent for the « distributed memory » aspect.
Or that’s my speculation.
I wonder if that’s too flattering for connectionism, which mostly stalled until the early breakthrough in computer vision suddenly attract every labs. BTW
Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there’s a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?
I’m not sure what your question about ordering in sensory areas is about.
As for backprop, that gets the distribution done, but that’s only part of the problem. In LLMs, for example, it seems that syntactic information is handled in the first few layers of the model. Given the way texts are structured, it makes sense that sentence-level information should be segregated from information about collections of sentences. That’s the kind of structure I’m talking about. Sure, backprop is responsible for those layers, but it’s responsible for all the other layers as well. Why do we seem to have different kinds of information in different layers at all? That’s what interests me.
Actually, it just makes sense to me that that is the case. Given that it is, what is located where? As for why things are segregated by location, that does need an answer, doesn’t it. Is that what you were asking?
Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there’s a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?
In my view: yes, no. To put some flesh on the bone, my working hypothesis is: what’s conscious is gamma activity within an isocortex connected to the claustrum (because that’s the information which will get selected for the next conscious frame/can be considered as in working memory)
I’m not sure what your question about ordering in sensory areas is about.
You said: what matters is temporal dynamics. I said: why so many maps if what matters is timing?
Why do we seem to have different kinds of information in different layers at all? That’s what interests me.
The closer to the input, the more sensory. The closer to the output, the more motor. The closer to the restrictions, the easier to interpret activity as latent space. Is there any regularity that you feel hard to interpret this way?
Finally, here’s an idea I’ve been playing around with for a long time:
Thanks, I’ll go read. Don’t hesitate to add other links that can help understand your vision.
You mean this: “We’re not talking about some specific location or space in the brain; we’re talking about a process.”
You mean there’s some key difference in meaning between your original formulation and my reformulation? Care to elaborate and formulate some specific prediction?
As an example, I once gave a try at interpreting data from olfactory system for a friend who were wondering if we could find sign of an chaotic attractor. If you ever toy with Lorenz model, one key feature is: you either see the attractor by plotting x vs vs z, or you can see it by plotting one of these variable only vs itself at t+delta vs itself at t+2*delta (for many deltas). In other words, that gives a precise feature you can look for (I didn’t find any, and nowadays it seems accepted that odors are location specific, like every other sense). Do you have a better idea or it’s more or less what you’d have tried?
I’ve lost the thread entirely. Where have I ever said or implied that odors are not location specific or that anything else is not location specific. And how specific are you about location? Are we talking about centimeters (or more), millimeters, individual cortical columns?
What’s so obscure about the idea that consciousness is a process that can take place pretty much anywhere, though maybe its confined to interaction within the cortex and between subcortical areas, I’ve not given that one much thought. BTW, I take my conception of consciousness from William Powers, who didn’t speculation about its location in the brain.
Nothing at all. I’m big fan of these kind of ideas and I’d love to present yours to some friends, but I’m afraid they’ll get dismissive if I can’t translate your thoughts into their usual frame of reference. But I get you didn’t work this aspect specifically, there’s many fields in cognitive sciences.
About how much specificity, it’s up to interpretation. A (1k by 1k by frame by cell type by density) tensor representing the cortical columns within the granular cortices is indeed a promising interpretation, although it’d probably be short of an extrapyramidal tensor (and maybe an agranular one).
Well, when Walter Freeman was working on the olfactory cortex of rodents he was using a surface mounted 8x8 matrix of electrodes. I assume that measured in millimeters. In his 1999 paper Consciousness, Intentionality, and Causality (paragraphs 36 − 43) a hemisphere-wide global operator (42):
I propose that the globally coherent activity, which is an order parameter, may be an objective correlate of awareness through preafference, comprising expectation and attention, which are based in prior proprioceptive and exteroceptive feedback of the sensory consequences of previous actions, after they have undergone limbic integration to form Gestalts, and in the goals that are emergent in the limbic system. In this view, awareness is basically akin to the intervening state variable in a homeostatic mechanism, which is both a physical quantity, a dynamic operator, and the carrier of influence from the past into the future that supports the relation between a desired set point and an existing state.
Later (43):
What is most remarkable about this operator is that it appears to be antithetical to initiating action. It provides a pervasive neuronal bias that does not induce phase transitions, but defers them by quenching local fluctuations (Prigogine, 1980). It alters the attractor landscapes of the lower order interactive masses of neurons that it enslaves. In the dynamicist view, intervention by states of awareness in the process of consciousness organizes the attractor landscape of the motor systems, prior to the instant of its next phase transition, the moment of choosing in the limbo of indecision, when the global dynamic brain activity pattern is increasing its complexity and fine-tuning the guidance of overt action. This state of uncertainty and unreadiness to act may last a fraction of a second, a minute, a week, or a lifetime. Then when a contemplated act occurs, awareness follows the onset of the act and does not precede it.
He goes on from there. I’m not sure whether he came back to that idea before he died in 2016. I haven’t found it, didn’t do an exhaustive search, but I did look.
Thanks, I didn’t know this perspective on the history of our science. The stories I most heard were indeed more about HH model, Hebb rule, Kohonen map, RL, and then connexionism became deep learning..
…but neural networks did refute that idea! I feel like I’m missing something here, especially since you then mention GPU. Was sequential a typo?
How so?
When I hear « conventional, sequential, computational regime », my understanding is « the way everyone was trying before parallel computation revolutionized computer vision ». What’s your definition so that using GPU feels sequential?
Oh, I didn’t mean to say imply that using GPUs was sequential, not at all. What I meant was that the connectionist alternative didn’t really take off until GPUs were used, making massive parallelism possible.
Going back to Yevick, in her 1975 paper she often refers to holographic logic as ‘one-shot’ logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory ‘surface’ is searched in one unitary operation.
In an LLM, I’m thinking of the generation of a single token as such a unitary or primitive process. That is to say, I think of the LLM as a “virtual machine” (I first saw the phrase in a blog post by Chris Olah) that is running an associative memory machine. Physically, yes, we’ve got a massive computation involving every parameter and (I’m assuming) there’s a combination of massive parallel and sequential operations taking place in the GPUs. Complete physical parallelism isn’t possible (yet). But there are no logical operations taking place in this virtual operation, no transfer of control. It’s one operation.
Obviously, though, considered as an associative memory device, an LLM is capable of much more than passive storage and retrieval. It performs analytic and synthetic operations over the memory based on the prompt, which is just a probe (‘reference beam’ in holographic terms) into an associative memory. We’ve got to understand how the memory is structured so that that is possible.
More later.
A few comments before later. 😉
Thanks for the clarification! I guess you already noticed how research centers in cognitive science seem to have a failure mode over a specific value question: Do we seek excellence at the risk of overfitting funding agency criterion, or do we seek fidelity to our interdisciplinary mission at the risk of compromising growth?
I certainly agree that, before the GPUs, the connectionist approach had a very small share of the excellence tokens. But it was already instrumental in providing a common conceptual framework beyond cognitivism. As an example, even the first PCs were enough to run toy examples of double dissociation using networks structured by sensory type rather than by cognitive operation. From a neuropsychological point of view, that was already a key result. And for the neuroscientist in me, toy models like Kohonen maps were already key to make sense of why we need so many short inhibitory neurons in grid-like cortical structures.
Like a refresh rate? That would fit the evidence for a 3-7 Hz refresh rate of our cartesian theater, or the way LLMs go through prompt/answer cycles. Do you see other potential uses for this concept?
What’s wrong with « the distributed way »?
In a paper I wrote awhile back I cite the late Walter Freeman as arguing that “consciousness arises as discontinuous whole-hemisphere states succeeding one another at a “frame rate” of 6 Hz to 10 Hz” (p. 2). I’m willing to speculate that that’s your ‘one-shot’ refresh rate. BTW, Freeman didn’t believe in a Cartesian theater and neither do it; the imagery of the stage ‘up there’ and the seating area ‘back here’ is not at all helpful. We’re not talking about some specific location or space in the brain; we’re talking about a process.
Well, of course, “the distributed way.” But what is that? Prompt engineering is about maneuvering your way through the LLM; you’re attempting to manipulate the structure inherent in those weights to produce a specific result you want.
That 1978 comment of Yevick’s that I quote in that blog post I mentioned somewhere up there, was in response to an article by John Haugeland evaluating cognitivism. He wondered whether or not there was an alternative and suggested holography as a possibility. He didn’t make a very plausible case and few of the commentators took is as a serious alternative.
People were looking for alternatives. But it took awhile for connectionism to build up a record of interesting results, on the one hand, for cognitivism to begin seeming stale on the other hand. It’s the combination of the two that brought about significant intellectual change. Or that’s my speculation.
It’s possible. I don’t think there was relevant human data in Walter Freeman time, so I’m willing to speculate that’s indeed the frame rate in mouse. But I didn’t check the literature he had access to, so just a wild guess.
I agree there’s no seating area. I still find the concept of a cartesian theater useful. For exemple, it allows knowing where to plant electrodes if you want to access the visual cartesian theater for rehabilitation purposes. I guess you’d agree that can be helpful. 😉
I have friends who believe that, but they can’t explain why the brain needs that much ordering in the sensory areas. What’s your own take?
You know backprop algorithm? That’s a mathematical model for the distributed way. It was recently shown that it produces networks that explains (statistically speaking) most the properties of the BOLD cortical response in our visial systems. So, whatever the biological cortices actually do, it turns equivalent for the « distributed memory » aspect.
I wonder if that’s too flattering for connectionism, which mostly stalled until the early breakthrough in computer vision suddenly attract every labs. BTW
Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there’s a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?
I’m not sure what your question about ordering in sensory areas is about.
As for backprop, that gets the distribution done, but that’s only part of the problem. In LLMs, for example, it seems that syntactic information is handled in the first few layers of the model. Given the way texts are structured, it makes sense that sentence-level information should be segregated from information about collections of sentences. That’s the kind of structure I’m talking about. Sure, backprop is responsible for those layers, but it’s responsible for all the other layers as well. Why do we seem to have different kinds of information in different layers at all? That’s what interests me.
Actually, it just makes sense to me that that is the case. Given that it is, what is located where? As for why things are segregated by location, that does need an answer, doesn’t it. Is that what you were asking?
Finally, here’s an idea I’ve been playing around with for a long time: Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world].
In my view: yes, no. To put some flesh on the bone, my working hypothesis is: what’s conscious is gamma activity within an isocortex connected to the claustrum (because that’s the information which will get selected for the next conscious frame/can be considered as in working memory)
You said: what matters is temporal dynamics. I said: why so many maps if what matters is timing?
The closer to the input, the more sensory. The closer to the output, the more motor. The closer to the restrictions, the easier to interpret activity as latent space. Is there any regularity that you feel hard to interpret this way?
Thanks, I’ll go read. Don’t hesitate to add other links that can help understand your vision.
“You said: what matters is temporal dynamics”
You mean this: “We’re not talking about some specific location or space in the brain; we’re talking about a process.”
If so, all I meant was a process that can take place pretty much anywhere. Consciousness can pretty much ‘float’ to wherever its needed.
Since you asked for more, why not this: Direct Brain-to-Brain Thought Transfer: A High Tech Fantasy that Won’t Work.
You mean there’s some key difference in meaning between your original formulation and my reformulation? Care to elaborate and formulate some specific prediction?
As an example, I once gave a try at interpreting data from olfactory system for a friend who were wondering if we could find sign of an chaotic attractor. If you ever toy with Lorenz model, one key feature is: you either see the attractor by plotting x vs vs z, or you can see it by plotting one of these variable only vs itself at t+delta vs itself at t+2*delta (for many deltas). In other words, that gives a precise feature you can look for (I didn’t find any, and nowadays it seems accepted that odors are location specific, like every other sense). Do you have a better idea or it’s more or less what you’d have tried?
I’ve lost the thread entirely. Where have I ever said or implied that odors are not location specific or that anything else is not location specific. And how specific are you about location? Are we talking about centimeters (or more), millimeters, individual cortical columns?
What’s so obscure about the idea that consciousness is a process that can take place pretty much anywhere, though maybe its confined to interaction within the cortex and between subcortical areas, I’ve not given that one much thought. BTW, I take my conception of consciousness from William Powers, who didn’t speculation about its location in the brain.
Nothing at all. I’m big fan of these kind of ideas and I’d love to present yours to some friends, but I’m afraid they’ll get dismissive if I can’t translate your thoughts into their usual frame of reference. But I get you didn’t work this aspect specifically, there’s many fields in cognitive sciences.
About how much specificity, it’s up to interpretation. A (1k by 1k by frame by cell type by density) tensor representing the cortical columns within the granular cortices is indeed a promising interpretation, although it’d probably be short of an extrapyramidal tensor (and maybe an agranular one).
Well, when Walter Freeman was working on the olfactory cortex of rodents he was using a surface mounted 8x8 matrix of electrodes. I assume that measured in millimeters. In his 1999 paper Consciousness, Intentionality, and Causality (paragraphs 36 − 43) a hemisphere-wide global operator (42):
Later (43):
He goes on from there. I’m not sure whether he came back to that idea before he died in 2016. I haven’t found it, didn’t do an exhaustive search, but I did look.