One possibility that I find plausible as a path to AGI is if we design something like a Language Model Cognitive Architecture (LMCA) along the lines of AutoGPT, and require that its world model actually be some explicit combination of human natural language, mathematical equations, and executable code that might be fairly interpretable to humans. Then the only potions of its world model that are very hard to inspect are those embedded in the LLM component.
Cool! I am working on something that is fairly similar (with a bunch of additional safety considerations). I don’t go too deeply into the architecture in my article, but would be curious what you think!
Cool! I am working on something that is fairly similar (with a bunch of additional safety considerations). I don’t go too deeply into the architecture in my article, but would be curious what you think!