The idea of generating and directly transferring a pre-digested latent representation is super interesting, but my prior is that this couldn’t work. However a neural network trained from initially randomized weights represents concepts is likely to be highly idiosyncratic to that particular network. Perhaps this could be accomplished between AIs if we can somehow make that process and initial state less random, but how could that ever work for humans?
The highest-bandwith sensory input for humans is their eyes. Doesn’t this idea just amount to diagrams of high-dimensional data?
It works for AIs very easily. Just feed the patents from AI 1 into AI 2. No need for special engineering of the two AIs.
It also works for humans, at least somewhat. E.g., the Eyeronman vests I mentioned translate 3-D scene representations into vibrations. After enough time with one, people can pick up a sense of what the environment around them is like through the vibrations from the vest.
Translating LLM patents into visual input wouldn’t look like normal diagrams. It would look like a random-seeming mishmash of colors and shapes which encode the LLM’s latents. A person would then be shown many pairs of text and the encoded latents the model generated for the text. In time, I expect the person would gain a “text sense” where they can infer the meaning of the text from just the visual encoding of the model’s latents.
I think I’m lacking some jargon here. What’s a latent/patent in the context of a large language model? “patent” is ungoogleable if you’re not talking about intellectual property law.
The Eyeronman link didn’t seem very informative. No explanation of how it works. I already knew sensory substitution was a thing, but is this different somehow? Is there some neural net pre-digesting its outputs? Is it similarly a random-seeming mismash? Are there any other examples of this kind of thing working for humans? Visually?
Would the mismash from a smaller text model be any easier/faster for the human to learn?
The idea of generating and directly transferring a pre-digested latent representation is super interesting, but my prior is that this couldn’t work. However a neural network trained from initially randomized weights represents concepts is likely to be highly idiosyncratic to that particular network. Perhaps this could be accomplished between AIs if we can somehow make that process and initial state less random, but how could that ever work for humans?
The highest-bandwith sensory input for humans is their eyes. Doesn’t this idea just amount to diagrams of high-dimensional data?
It works for AIs very easily. Just feed the patents from AI 1 into AI 2. No need for special engineering of the two AIs.
It also works for humans, at least somewhat. E.g., the Eyeronman vests I mentioned translate 3-D scene representations into vibrations. After enough time with one, people can pick up a sense of what the environment around them is like through the vibrations from the vest.
Translating LLM patents into visual input wouldn’t look like normal diagrams. It would look like a random-seeming mishmash of colors and shapes which encode the LLM’s latents. A person would then be shown many pairs of text and the encoded latents the model generated for the text. In time, I expect the person would gain a “text sense” where they can infer the meaning of the text from just the visual encoding of the model’s latents.
I think I’m lacking some jargon here. What’s a latent/patent in the context of a large language model? “patent” is ungoogleable if you’re not talking about intellectual property law.
The Eyeronman link didn’t seem very informative. No explanation of how it works. I already knew sensory substitution was a thing, but is this different somehow? Is there some neural net pre-digesting its outputs? Is it similarly a random-seeming mismash? Are there any other examples of this kind of thing working for humans? Visually?
Would the mismash from a smaller text model be any easier/faster for the human to learn?
My money’s on: typo.