I think maybe a more powerful framework than discrete contexts is that there’s a giant soup of models, and the models have arrows pointing at other models, and multiple models can be active simultaneously, and the models can span different time scales. So you can have a “I am in the store” model, and it’s active for the whole time you’re shopping, and meanwhile there are faster models like “I am looking for noodles”, and slower models like “go shopping then take the bus home”. And anything can point to anything else. So then if you have a group of models that mainly point to each other, and less to other stuff, it’s a bit of an island in the graph, and you can call it a “context”. Like everything I know about chess strategy is mostly isolated from the rest of my universe of knowledge and ideas, so I could say I have a “chess strategy context”. But that’s an emergent property, not part of the data structure.
My impression is that the Goodhart’s law thing at the end is a bit like saying “Don’t think creatively”… Thinking creatively is making new connections where they don’t immediately pop into your head. Is that reasonable? Sorry if I’m misunderstanding. :)
Yeah, I agree, it seems both more human-like and more powerful to have a dynamical system where models are activating other models based on something like the “lock and key” matching of neural attention. But for alignment purposes, it seems to me that we need to not only optimize models for usefulness or similarity to actual human thought, but also for how similar they are to how humans think of human thought—when we imagine an AI with the goal of doing good, we want it to have decision-making that matches our understanding of “doing good.” The model in this post isn’t as neat and clean as utility maximization, but a lot of the overly-neat features have to do with making it more convenient to talk about it having a fixed, human-comprehensible goal.
Re: creativity, I see how you’d get that from what I wrote but I think that’s only half right. The model laid out in this post is perfectly capable of designing new solutions to problems—it just tends to do it by making a deliberate choice to take a “design a new solution” action. Another source of creativity is finding surprising solutions to difficult search problems, which is perfectly possible in complicated contexts.
Another source of creativity is compositionality, which you can have in this formalism by attributing it to the transition function putting you ino
to a composed context. Can you learn this while trying to mimic humans? I’m not sure, but it seems possible.
We might also attribute a deficit in creativity to the fact that the reward functions are only valid in-context, and aren’t designed to generalize to new states, even if there were really apt ways of thinking about the world that involved novel contexts or adding new states to existing contexts. And maybe this is the important part, because I think this is a key feature, not at all a bug.
I think maybe a more powerful framework than discrete contexts is that there’s a giant soup of models, and the models have arrows pointing at other models, and multiple models can be active simultaneously, and the models can span different time scales. So you can have a “I am in the store” model, and it’s active for the whole time you’re shopping, and meanwhile there are faster models like “I am looking for noodles”, and slower models like “go shopping then take the bus home”. And anything can point to anything else. So then if you have a group of models that mainly point to each other, and less to other stuff, it’s a bit of an island in the graph, and you can call it a “context”. Like everything I know about chess strategy is mostly isolated from the rest of my universe of knowledge and ideas, so I could say I have a “chess strategy context”. But that’s an emergent property, not part of the data structure.
My impression is that the Goodhart’s law thing at the end is a bit like saying “Don’t think creatively”… Thinking creatively is making new connections where they don’t immediately pop into your head. Is that reasonable? Sorry if I’m misunderstanding. :)
Yeah, I agree, it seems both more human-like and more powerful to have a dynamical system where models are activating other models based on something like the “lock and key” matching of neural attention. But for alignment purposes, it seems to me that we need to not only optimize models for usefulness or similarity to actual human thought, but also for how similar they are to how humans think of human thought—when we imagine an AI with the goal of doing good, we want it to have decision-making that matches our understanding of “doing good.” The model in this post isn’t as neat and clean as utility maximization, but a lot of the overly-neat features have to do with making it more convenient to talk about it having a fixed, human-comprehensible goal.
Re: creativity, I see how you’d get that from what I wrote but I think that’s only half right. The model laid out in this post is perfectly capable of designing new solutions to problems—it just tends to do it by making a deliberate choice to take a “design a new solution” action. Another source of creativity is finding surprising solutions to difficult search problems, which is perfectly possible in complicated contexts.
Another source of creativity is compositionality, which you can have in this formalism by attributing it to the transition function putting you ino to a composed context. Can you learn this while trying to mimic humans? I’m not sure, but it seems possible.
We might also attribute a deficit in creativity to the fact that the reward functions are only valid in-context, and aren’t designed to generalize to new states, even if there were really apt ways of thinking about the world that involved novel contexts or adding new states to existing contexts. And maybe this is the important part, because I think this is a key feature, not at all a bug.