You can hook a chess-playing network up to a vision network and have it play chess using images of boards—it’s not difficult.
I think you have to be careful here. In this setup, you have two different AI’s: One vision network that classified images, and the chess AI that plays chess, and presumably connecting code that translates the output of the vision into a format suitable for the chess player.
I think what Sarah is referring to is that if you tried to directly hook up the images to the chess engine, it wouldn’t be able to figure it out, because reading images is not something it’s trained to do.
I honestly think of specialised models not as brains in their own right, but as cortexes. Pieces of a brain. But you can obviously hook them up together to do all sorts of things (for example, a multimodal LLM could take an image of a board and turn it into a series of coordinates and piece names). The one thing is that these models all would exist one level below the emergent simulacra that have actual agency. They’re the book or the operator or the desk in the Chinese Room. But it’s the Room as a whole that is intelligent and agentic.
Or in other words: our individual neurons don’t optimise for world-referenced goals either. Their goal is just “fire if stimulated so-and-so”.
Yes and networks of sensory neurons are apparently minimizing prediction error similar to LLM with next word prediction but with neurons also minimizing prediction across hierarchies. They are obviously not agents but combine into one.
I think you have to be careful here. In this setup, you have two different AI’s: One vision network that classified images, and the chess AI that plays chess, and presumably connecting code that translates the output of the vision into a format suitable for the chess player.
I think what Sarah is referring to is that if you tried to directly hook up the images to the chess engine, it wouldn’t be able to figure it out, because reading images is not something it’s trained to do.
I honestly think of specialised models not as brains in their own right, but as cortexes. Pieces of a brain. But you can obviously hook them up together to do all sorts of things (for example, a multimodal LLM could take an image of a board and turn it into a series of coordinates and piece names). The one thing is that these models all would exist one level below the emergent simulacra that have actual agency. They’re the book or the operator or the desk in the Chinese Room. But it’s the Room as a whole that is intelligent and agentic.
Or in other words: our individual neurons don’t optimise for world-referenced goals either. Their goal is just “fire if stimulated so-and-so”.
Yes and networks of sensory neurons are apparently minimizing prediction error similar to LLM with next word prediction but with neurons also minimizing prediction across hierarchies. They are obviously not agents but combine into one.