Yeah, I agree, it seems both more human-like and more powerful to have a dynamical system where models are activating other models based on something like the “lock and key” matching of neural attention. But for alignment purposes, it seems to me that we need to not only optimize models for usefulness or similarity to actual human thought, but also for how similar they are to how humans think of human thought—when we imagine an AI with the goal of doing good, we want it to have decision-making that matches our understanding of “doing good.” The model in this post isn’t as neat and clean as utility maximization, but a lot of the overly-neat features have to do with making it more convenient to talk about it having a fixed, human-comprehensible goal.
Re: creativity, I see how you’d get that from what I wrote but I think that’s only half right. The model laid out in this post is perfectly capable of designing new solutions to problems—it just tends to do it by making a deliberate choice to take a “design a new solution” action. Another source of creativity is finding surprising solutions to difficult search problems, which is perfectly possible in complicated contexts.
Another source of creativity is compositionality, which you can have in this formalism by attributing it to the transition function putting you ino
to a composed context. Can you learn this while trying to mimic humans? I’m not sure, but it seems possible.
We might also attribute a deficit in creativity to the fact that the reward functions are only valid in-context, and aren’t designed to generalize to new states, even if there were really apt ways of thinking about the world that involved novel contexts or adding new states to existing contexts. And maybe this is the important part, because I think this is a key feature, not at all a bug.
Yeah, I agree, it seems both more human-like and more powerful to have a dynamical system where models are activating other models based on something like the “lock and key” matching of neural attention. But for alignment purposes, it seems to me that we need to not only optimize models for usefulness or similarity to actual human thought, but also for how similar they are to how humans think of human thought—when we imagine an AI with the goal of doing good, we want it to have decision-making that matches our understanding of “doing good.” The model in this post isn’t as neat and clean as utility maximization, but a lot of the overly-neat features have to do with making it more convenient to talk about it having a fixed, human-comprehensible goal.
Re: creativity, I see how you’d get that from what I wrote but I think that’s only half right. The model laid out in this post is perfectly capable of designing new solutions to problems—it just tends to do it by making a deliberate choice to take a “design a new solution” action. Another source of creativity is finding surprising solutions to difficult search problems, which is perfectly possible in complicated contexts.
Another source of creativity is compositionality, which you can have in this formalism by attributing it to the transition function putting you ino to a composed context. Can you learn this while trying to mimic humans? I’m not sure, but it seems possible.
We might also attribute a deficit in creativity to the fact that the reward functions are only valid in-context, and aren’t designed to generalize to new states, even if there were really apt ways of thinking about the world that involved novel contexts or adding new states to existing contexts. And maybe this is the important part, because I think this is a key feature, not at all a bug.