Bill Benzon comments on On possible cross-fertilization between AI and neuroscience [Creativity]

Bill Benzon 28 Nov 2023 14:43 UTC
2 points
0
Yeah, he’s talking about neuroscience. I get that. But “episodic memory” is a term of art and the idea behind it didn’t come from neuroscience. It’s quite possible that he just doesn’t know the intellectual history and is taking “episodic memory” as a term that’s in general use, which it is. But he’s also making claims about intellectual history.
Because he’s using that term in that context, I don’t know just what claim he’s making. Is he also (implicitly) claiming that neuroscience is the source of the idea? If he thinks that, then he’s wrong. If he’s just saying that he got the idea from neuroscience, OK.
But, the idea of a “general distributed architecture” doesn’t have anything to do with the idea of episodic memory. They are orthogonal notions, if you will.
- Ilio 28 Nov 2023 15:21 UTC
  1 point
  0
  Parent
  Your point is « Good AIs should have a working memory, a concept that comes from psychology ».
  
  DH point is « Good AIs should have a working memory, and the way to implement it was based on concepts taken from neuroscience ».
  
  That’s indeed orthogonal notions, if you will.
  - Bill Benzon 28 Nov 2023 19:19 UTC
    2 points
    0
    Parent
    I did a little checking. It’s complicated. In 2017 Hassibis published an article entitled “Neuroscience-Inspired Artificial Intelligence” in which he attributes the concept of episodic memory to a review article that Endel Tulving published in 2002, “EPISODIC MEMORY: From Mind to Brain.” That article has quite a bit to say about the brain. In the 2002 article Tulving dates the concept to an article he published in 1972. That article is entitled “Episodic and Semantic Memory.” As far as I know, while there are precedents – everything can be fobbed off on Plato if you’ve a mind to do it, that’s where the notion of episodic memory enters in to modern discussions.
    Why do I care about this kind of detail? First, I’m a scholar and it’s my business to care about these things. Second, a lot of people in contemporary AI and ML are dismissive of symbolic AI from the 1950s through the 1980s and beyond. While Tulving was not an AI researcher, he was very much in the cognitive science movement, which included philosophy, psychology, linguistics, and AI (later on, neuroscientists would join in). I have no idea whether or not Hassibis is himself dismissive of that work, but many are. It’s hypocritical to write off the body of work while using some of the ideas. These problems are too deep and difficult to write off whole bodies of research in part because they happened before you were born – FWIW Hassibis was born in 1976.
    - Ilio 28 Nov 2023 22:19 UTC
      2 points
      0
      Parent
      
      I have no idea whether or not Hassibis is himself dismissive of that work
      
      Well that’s a problem, don’t you think?
      
      but many are.
      
      Yes, as a cognitive neuroscientist myself, you’re right that many within my generation tend to dismiss symbolic approaches. We were students during a winter that many of us thought caused by the over promising and under delivering of the symbolic approach, with Minsky as the main reason for the slow start of neural networks. I bet you have a different perspective. What’s your three best points for changing the view of my generation?
      - Bill Benzon 28 Nov 2023 23:06 UTC
        2 points
        0
        Parent
        I’ll get back to you tomorrow. I don’t think it’s a matter of going back to the old ways. ANNs are marvelous; they’re here to stay. The issue is one of integrating some symbolic ideas. It’s not at all clear how that’s to be done. If you wish, take a look at this blog post: Miriam Yevick on why both symbols and networks are necessary for artificial minds.
        Ilio 29 Nov 2023 0:13 UTC
        2 points
        1
        Parent
        Fascinating paper! I wonder how much they would agree that holography means sparse tensors and convolution, or that the intuitive versus reflexive thinking basically amount to visuo-spatial versus phonological loop. Can’t wait to hear which other idea you’d like to import from this line of thought.
        Bill Benzon 29 Nov 2023 18:28 UTC
        4 points
        0
        Parent
        Miriam Lipshutz Yevick was born in 1924 and died in 2018, so we can’t ask her these questions. She fled Europe with her family inn 1940 for the same reason many Jews fled Europe and ended up in Hoboken, NJ. Seven years later she got a PhD in math from MIT; she was only the 5th woman to get that degree from MIT. But, as both a woman and a Jew, she had almost no chance of an academic post in 1947. She eventually got an academic gig, but it was at a college oriented toward adult education. Still, she managed to do some remarkable mathematical work.
        The two papers I mention in that blog post were written in the mid-1970s. That was the height of classic symbolic AI and the cognitive science movement more generally. Newell and Simon got their Turing Award in 1975, the year Yevick wrote that remarkable 1975 paper on holographic logic, which deserves to be more widely known. She wrote as a mathematician interested in holography (an interest she developed while corresponding with physicist David Bohm in the 1950s), not as a cognitive scientist. Of course, in arguing for holography as a model for (one kind of) thought, she was working against the tide. Very few were thinking in such terms at that time. Rosenblatt’s work was in the past, and had been squashed by Minsky and Pappert, as you’ve noted. The West Coast connectionist work didn’t jump off until the mid-1980s.
        So there really wasn’t anyone in the cognitive science community at the time to investigate the line of thinking she initiated. While she wasn’t thinking about real computation, you know, something you actually do on computers, she thought abstractly in computational terms, such as Turing and others did (though Turing also worked with actual computers). It seems to me that her contribution was to examine the relationship between a computational regime and the objects over which he was asked to compute. She’s quite explicit about that. If the object tends toward geometrical simplicity – she was using identification of visual objects as her domain – then a conventional, sequential, computational regime was most effective. What’s what cognitive science was all about at the time. If the object tends toward geometrical complexity then a different regime was called for, what she called holographic or Fourier logic. I don’t know about sparse tensors, but convolution, yes.
        Later on, in the 1980s, as you may know, Hans Moravic would talk about a paradox (which became named after him). In the early days of AI, researchers worked on abstract domains, like chess and theorem proving, domains that take a high level cognitive ability. Things went pretty well, though the extravagant predictions had yet to pan out. When they turned toward vision and language in the late 1960s and into the 70s and 80s, things fell apart. Those were things that young kids could do. The paradox, then, was that AI was most effective at cognitively difficult things, and least effective with cognitively simple things.
        The issue was in fact becoming visible in the 1970s. I read about it in David Marr, and he died in 1980. Had it been explicitly theorized when Yevick wrote? I don’t know. But she had an answer to the paradox. The computational regime favored by AI and the cognitive sciences at the time simply was not well-suited to complex visual objects, though they presented to problems to 2-year-olds, or to language, with all those vaguely defined terms anchored in physically complex phenomena. They needed a different computational regime, and eventually we got one, though not really until GPUs were exploited.
        More later, perhaps.
        Ilio 29 Nov 2023 23:46 UTC
        1 point
        0
        Parent
        Thanks, I didn’t know this perspective on the history of our science. The stories I most heard were indeed more about HH model, Hebb rule, Kohonen map, RL, and then connexionism became deep learning..
        
        If the object tends toward geometrical simplicity – she was using identification of visual objects as her domain – then a conventional, sequential, computational regime was most effective.
        
        …but neural networks did refute that idea! I feel like I’m missing something here, especially since you then mention GPU. Was sequential a typo?
        Bill Benzon 30 Nov 2023 0:28 UTC
        1 point
        0
        Parent
        How so?
        Ilio 30 Nov 2023 9:46 UTC
        1 point
        0
        Parent
        When I hear « conventional, sequential, computational regime », my understanding is « the way everyone was trying before parallel computation revolutionized computer vision ». What’s your definition so that using GPU feels sequential?
        Bill Benzon 30 Nov 2023 11:38 UTC
        2 points
        0
        Parent
        Oh, I didn’t mean to say imply that using GPUs was sequential, not at all. What I meant was that the connectionist alternative didn’t really take off until GPUs were used, making massive parallelism possible.
        Going back to Yevick, in her 1975 paper she often refers to holographic logic as ‘one-shot’ logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory ‘surface’ is searched in one unitary operation.
        In an LLM, I’m thinking of the generation of a single token as such a unitary or primitive process. That is to say, I think of the LLM as a “virtual machine” (I first saw the phrase in a blog post by Chris Olah) that is running an associative memory machine. Physically, yes, we’ve got a massive computation involving every parameter and (I’m assuming) there’s a combination of massive parallel and sequential operations taking place in the GPUs. Complete physical parallelism isn’t possible (yet). But there are no logical operations taking place in this virtual operation, no transfer of control. It’s one operation.
        Obviously, though, considered as an associative memory device, an LLM is capable of much more than passive storage and retrieval. It performs analytic and synthetic operations over the memory based on the prompt, which is just a probe (‘reference beam’ in holographic terms) into an associative memory. We’ve got to understand how the memory is structured so that that is possible.
        More later.
        Expand this thread
        Ilio 30 Nov 2023 18:50 UTC
        2 points
        0
        Parent
        A few comments before later. 😉
        
        What I meant was that the connectionist alternative didn’t really take off until GPUs were used, making massive parallelism possible.
        
        Thanks for the clarification! I guess you already noticed how research centers in cognitive science seem to have a failure mode over a specific value question: Do we seek excellence at the risk of overfitting funding agency criterion, or do we seek fidelity to our interdisciplinary mission at the risk of compromising growth?
        
        I certainly agree that, before the GPUs, the connectionist approach had a very small share of the excellence tokens. But it was already instrumental in providing a common conceptual framework beyond cognitivism. As an example, even the first PCs were enough to run toy examples of double dissociation using networks structured by sensory type rather than by cognitive operation. From a neuropsychological point of view, that was already a key result. And for the neuroscientist in me, toy models like Kohonen maps were already key to make sense of why we need so many short inhibitory neurons in grid-like cortical structures.
        
        Going back to Yevick, in her 1975 paper she often refers to holographic logic as ‘one-shot’ logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory ‘surface’ is searched in one unitary operation.
        
        Like a refresh rate? That would fit the evidence for a 3-7 Hz refresh rate of our cartesian theater, or the way LLMs go through prompt/answer cycles. Do you see other potential uses for this concept?
        
        We’ve got to understand how the memory is structured so that that is possible.
        
        What’s wrong with « the distributed way »?
        Bill Benzon 30 Nov 2023 19:36 UTC
        2 points
        0
        Parent
        In a paper I wrote awhile back I cite the late Walter Freeman as arguing that “consciousness arises as discontinuous whole-hemisphere states succeeding one another at a “frame rate” of 6 Hz to 10 Hz” (p. 2). I’m willing to speculate that that’s your ‘one-shot’ refresh rate. BTW, Freeman didn’t believe in a Cartesian theater and neither do it; the imagery of the stage ‘up there’ and the seating area ‘back here’ is not at all helpful. We’re not talking about some specific location or space in the brain; we’re talking about a process.
        Well, of course, “the distributed way.” But what is that? Prompt engineering is about maneuvering your way through the LLM; you’re attempting to manipulate the structure inherent in those weights to produce a specific result you want.
        That 1978 comment of Yevick’s that I quote in that blog post I mentioned somewhere up there, was in response to an article by John Haugeland evaluating cognitivism. He wondered whether or not there was an alternative and suggested holography as a possibility. He didn’t make a very plausible case and few of the commentators took is as a serious alternative.
        People were looking for alternatives. But it took awhile for connectionism to build up a record of interesting results, on the one hand, for cognitivism to begin seeming stale on the other hand. It’s the combination of the two that brought about significant intellectual change. Or that’s my speculation.
        Ilio 1 Dec 2023 21:45 UTC
        2 points
        0
        Parent
        
        I’m willing to speculate that [6 Hz to 10 Hz ]that’s your ‘one-shot’ refresh rate.
        
        It’s possible. I don’t think there was relevant human data in Walter Freeman time, so I’m willing to speculate that’s indeed the frame rate in mouse. But I didn’t check the literature he had access to, so just a wild guess.
        
        the imagery of the stage ‘up there’ and the seating area ‘back here’ is not at all helpful
        
        I agree there’s no seating area. I still find the concept of a cartesian theater useful. For exemple, it allows knowing where to plant electrodes if you want to access the visual cartesian theater for rehabilitation purposes. I guess you’d agree that can be helpful. 😉
        
        We’re not talking about some specific location or space in the brain; we’re talking about a process.
        
        I have friends who believe that, but they can’t explain why the brain needs that much ordering in the sensory areas. What’s your own take?
        
        But what is [the distributed way]that?
        
        You know backprop algorithm? That’s a mathematical model for the distributed way. It was recently shown that it produces networks that explains (statistically speaking) most the properties of the BOLD cortical response in our visial systems. So, whatever the biological cortices actually do, it turns equivalent for the « distributed memory » aspect.
        
        Or that’s my speculation.
        
        I wonder if that’s too flattering for connectionism, which mostly stalled until the early breakthrough in computer vision suddenly attract every labs. BTW
        Bill Benzon 1 Dec 2023 22:31 UTC
        2 points
        0
        Parent
        Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there’s a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?
        I’m not sure what your question about ordering in sensory areas is about.
        As for backprop, that gets the distribution done, but that’s only part of the problem. In LLMs, for example, it seems that syntactic information is handled in the first few layers of the model. Given the way texts are structured, it makes sense that sentence-level information should be segregated from information about collections of sentences. That’s the kind of structure I’m talking about. Sure, backprop is responsible for those layers, but it’s responsible for all the other layers as well. Why do we seem to have different kinds of information in different layers at all? That’s what interests me.
        Actually, it just makes sense to me that that is the case. Given that it is, what is located where? As for why things are segregated by location, that does need an answer, doesn’t it. Is that what you were asking?
        Finally, here’s an idea I’ve been playing around with for a long time: Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world].
        Ilio 2 Dec 2023 0:12 UTC
        1 point
        0
        Parent
        
        Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there’s a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?
        
        In my view: yes, no. To put some flesh on the bone, my working hypothesis is: what’s conscious is gamma activity within an isocortex connected to the claustrum (because that’s the information which will get selected for the next conscious frame/can be considered as in working memory)
        
        I’m not sure what your question about ordering in sensory areas is about.
        
        You said: what matters is temporal dynamics. I said: why so many maps if what matters is timing?
        
        Why do we seem to have different kinds of information in different layers at all? That’s what interests me.
        
        The closer to the input, the more sensory. The closer to the output, the more motor. The closer to the restrictions, the easier to interpret activity as latent space. Is there any regularity that you feel hard to interpret this way?
        
        Finally, here’s an idea I’ve been playing around with for a long time:
        
        Thanks, I’ll go read. Don’t hesitate to add other links that can help understand your vision.
        Bill Benzon 2 Dec 2023 2:11 UTC
        2 points
        0
        Parent
        “You said: what matters is temporal dynamics”
        You mean this: “We’re not talking about some specific location or space in the brain; we’re talking about a process.”
        If so, all I meant was a process that can take place pretty much anywhere. Consciousness can pretty much ‘float’ to wherever its needed.
        Since you asked for more, why not this: Direct Brain-to-Brain Thought Transfer: A High Tech Fantasy that Won’t Work.
        Ilio 3 Dec 2023 19:53 UTC
        1 point
        0
        Parent
        
        You mean this: “We’re not talking about some specific location or space in the brain; we’re talking about a process.”
        
        You mean there’s some key difference in meaning between your original formulation and my reformulation? Care to elaborate and formulate some specific prediction?
        
        As an example, I once gave a try at interpreting data from olfactory system for a friend who were wondering if we could find sign of an chaotic attractor. If you ever toy with Lorenz model, one key feature is: you either see the attractor by plotting x vs vs z, or you can see it by plotting one of these variable only vs itself at t+delta vs itself at t+2*delta (for many deltas). In other words, that gives a precise feature you can look for (I didn’t find any, and nowadays it seems accepted that odors are location specific, like every other sense). Do you have a better idea or it’s more or less what you’d have tried?
        Bill Benzon 3 Dec 2023 20:13 UTC
        2 points
        0
        Parent
        I’ve lost the thread entirely. Where have I ever said or implied that odors are not location specific or that anything else is not location specific. And how specific are you about location? Are we talking about centimeters (or more), millimeters, individual cortical columns?
        What’s so obscure about the idea that consciousness is a process that can take place pretty much anywhere, though maybe its confined to interaction within the cortex and between subcortical areas, I’ve not given that one much thought. BTW, I take my conception of consciousness from William Powers, who didn’t speculation about its location in the brain.
        Ilio 5 Dec 2023 16:08 UTC
        1 point
        0
        Parent
        Nothing at all. I’m big fan of these kind of ideas and I’d love to present yours to some friends, but I’m afraid they’ll get dismissive if I can’t translate your thoughts into their usual frame of reference. But I get you didn’t work this aspect specifically, there’s many fields in cognitive sciences.
        
        About how much specificity, it’s up to interpretation. A (1k by 1k by frame by cell type by density) tensor representing the cortical columns within the granular cortices is indeed a promising interpretation, although it’d probably be short of an extrapyramidal tensor (and maybe an agranular one).
        Bill Benzon 5 Dec 2023 18:12 UTC
        1 point
        0
        Parent
        Well, when Walter Freeman was working on the olfactory cortex of rodents he was using a surface mounted 8x8 matrix of electrodes. I assume that measured in millimeters. In his 1999 paper Consciousness, Intentionality, and Causality (paragraphs 36 − 43) a hemisphere-wide global operator (42):
        I propose that the globally coherent activity, which is an order parameter, may be an objective correlate of awareness through preafference, comprising expectation and attention, which are based in prior proprioceptive and exteroceptive feedback of the sensory consequences of previous actions, after they have undergone limbic integration to form Gestalts, and in the goals that are emergent in the limbic system. In this view, awareness is basically akin to the intervening state variable in a homeostatic mechanism, which is both a physical quantity, a dynamic operator, and the carrier of influence from the past into the future that supports the relation between a desired set point and an existing state.
        Later (43):
        What is most remarkable about this operator is that it appears to be antithetical to initiating action. It provides a pervasive neuronal bias that does not induce phase transitions, but defers them by quenching local fluctuations (Prigogine, 1980). It alters the attractor landscapes of the lower order interactive masses of neurons that it enslaves. In the dynamicist view, intervention by states of awareness in the process of consciousness organizes the attractor landscape of the motor systems, prior to the instant of its next phase transition, the moment of choosing in the limbo of indecision, when the global dynamic brain activity pattern is increasing its complexity and fine-tuning the guidance of overt action. This state of uncertainty and unreadiness to act may last a fraction of a second, a minute, a week, or a lifetime. Then when a contemplated act occurs, awareness follows the onset of the act and does not precede it.
        He goes on from there. I’m not sure whether he came back to that idea before he died in 2016. I haven’t found it, didn’t do an exhaustive search, but I did look.