I think that this is definitely a concern for prosaic AI safety methods. In the case of something like amplification or debate, I think the bet that you’re making is that language modeling alone is sufficient to get you everything you need in a competitive way. I tend to think that that claim is probably true, but it’s definitely an assumption of the approach that isn’t often made explicit (but probably should be).
To add a bit of color to why you might buy the claim that language is all you need: the claim is basically that language contains enough structure to give you all the high-level cognition you could want, and furthermore that you aren’t going to care about the other things that you can’t get out of language like performance on fine-grained control tasks. Another way of thinking about this: if the primary purpose of your first highly advanced ML system is to build your second highly advanced ML system, then the claim is that language modelling (on some curriculum) will be sufficient to competitively help you build your next AI.
In the case of something like amplification or debate, I think the bet that you’re making is that language modeling alone is sufficient to get you everything you need in a competitive way.
I’m skeptical of language modeling being enough to be competitive, in the sense of maximizing “log prob of some naturally occurring data or human demonstrations.” I don’t have a strong view about whether you can get away using only language data rather than e.g. taking images as input and producing motor torques as output.
I’m also not convinced that amplification or debate need to make this bet though. If we can do joint training / fine-tuning of a language model using whatever other objectives we need, then it seems like we could just as well do joint training / fine-tuning for a different kind of model. What’s so bad if we use non-language data?
I’m skeptical of language modeling being enough to be competitive, in the sense of maximizing “log prob of some naturally occurring data or human demonstrations.” I don’t have a strong view about whether you can get away using only language data rather than e.g. taking images as input and producing motor torques as output.
I agree with this, though I still feel like some sort of active learning approach might be good enough without needing to add in a full-out RL objective.
I’m also not convinced that amplification or debate need to make this bet though. If we can do joint training / fine-tuning of a language model using whatever other objectives we need, then it seems like we could just as well do joint training / fine-tuning for a different kind of model. What’s so bad if we use non-language data?
My opinion would be that there is a real safety benefit from being in a situation where you know the theoretical optimum of your loss function (e.g. in a situation where you know that HCH is precisely the thing for which loss is zero). That being said, it does seem obviously fine to have your language data contain other types of data (e.g. images) inside of it.
My opinion would be that there is a real safety benefit from being in a situation where you know the theoretical optimum of your loss function (e.g. in a situation where you know that HCH is precisely the thing for which loss is zero).
I’d be happy to read more about this line of thought. (For example, does “loss function” here refer to an objective function that includes a regularization term? If not, what might we assume about the theoretical optimum that amounts to a safety benefit?)
I’d be happy to read an entire post about this view.
What level of language modeling may be sufficient for competitively helping in building the next AI, according to this view? For example, could such language modeling capabilities allow a model to pass strong (text-based) versions of the Turing test?
In my opinion, such language model should be able to create equivalence between the map of a territory and its verbal description.
In that case, an expression like “the red rose is in the corner” gets meaning as it allows to locate the rose on the map of the room, or otherwise, if the rose is observed in the corner, it could be described as “the rose is in the corner”.
Thus natural language could be used to describe all possible operations above world maps, like “all asteroids should be deflected”.
It was just an example of the relation between language and the world model. If I have an AI, I can say to it “Find the ways to deflect asteroids”. This AI will be able to create a model of Solar system, calculate future trajectories of all dangerous asteroids etc. So it could make a relation between my verbal command and 3D model of the real world.
The same is true if I ask an AI to bring me coffee from the kitchen: it has to select in its world model right kitchen, right type of coffee and right type of future activity.
Humans also do it: any time we read a text, we create a world model which corresponds to the description. And back, if we see a world model, like a picture, we could describe it words.
the claim is that language modelling (on some curriculum) will be sufficient to competitively help you build your next AI.
With an agent-like AI, it’s easy to see how you use it to help build your next AI. (If it’s really good, you can even just delegate the entire task to it!) How would this work with really good language modelling? (Maybe I’m just seconding what Ofer said—I’d love to read an entire post about the view you are putting forth here!)
The goal of something like amplification or debate is to create a sort of oracle AI that can answer arbitrary questions (like how to build your next AI) for you. The claim I’m making is just that language is a rich enough environment that it’ll be competitive to only use language as the training data for building your first such system.
I think that this is definitely a concern for prosaic AI safety methods. In the case of something like amplification or debate, I think the bet that you’re making is that language modeling alone is sufficient to get you everything you need in a competitive way. I tend to think that that claim is probably true, but it’s definitely an assumption of the approach that isn’t often made explicit (but probably should be).
To add a bit of color to why you might buy the claim that language is all you need: the claim is basically that language contains enough structure to give you all the high-level cognition you could want, and furthermore that you aren’t going to care about the other things that you can’t get out of language like performance on fine-grained control tasks. Another way of thinking about this: if the primary purpose of your first highly advanced ML system is to build your second highly advanced ML system, then the claim is that language modelling (on some curriculum) will be sufficient to competitively help you build your next AI.
I’m skeptical of language modeling being enough to be competitive, in the sense of maximizing “log prob of some naturally occurring data or human demonstrations.” I don’t have a strong view about whether you can get away using only language data rather than e.g. taking images as input and producing motor torques as output.
I’m also not convinced that amplification or debate need to make this bet though. If we can do joint training / fine-tuning of a language model using whatever other objectives we need, then it seems like we could just as well do joint training / fine-tuning for a different kind of model. What’s so bad if we use non-language data?
I agree with this, though I still feel like some sort of active learning approach might be good enough without needing to add in a full-out RL objective.
My opinion would be that there is a real safety benefit from being in a situation where you know the theoretical optimum of your loss function (e.g. in a situation where you know that HCH is precisely the thing for which loss is zero). That being said, it does seem obviously fine to have your language data contain other types of data (e.g. images) inside of it.
I’d be happy to read more about this line of thought. (For example, does “loss function” here refer to an objective function that includes a regularization term? If not, what might we assume about the theoretical optimum that amounts to a safety benefit?)
Thanks btw, I’m learning a lot from these replies. Are you thinking of training something agenty, or is the hope to train something that isn’t agenty?
I’d be happy to read an entire post about this view.
What level of language modeling may be sufficient for competitively helping in building the next AI, according to this view? For example, could such language modeling capabilities allow a model to pass strong (text-based) versions of the Turing test?
In my opinion, such language model should be able to create equivalence between the map of a territory and its verbal description.
In that case, an expression like “the red rose is in the corner” gets meaning as it allows to locate the rose on the map of the room, or otherwise, if the rose is observed in the corner, it could be described as “the rose is in the corner”.
Thus natural language could be used to describe all possible operations above world maps, like “all asteroids should be deflected”.
This is helpful, thanks, but I am still missing some pieces. Can you say more about how we would use this to deflect asteroids?
It was just an example of the relation between language and the world model. If I have an AI, I can say to it “Find the ways to deflect asteroids”. This AI will be able to create a model of Solar system, calculate future trajectories of all dangerous asteroids etc. So it could make a relation between my verbal command and 3D model of the real world.
The same is true if I ask an AI to bring me coffee from the kitchen: it has to select in its world model right kitchen, right type of coffee and right type of future activity.
Humans also do it: any time we read a text, we create a world model which corresponds to the description. And back, if we see a world model, like a picture, we could describe it words.
With an agent-like AI, it’s easy to see how you use it to help build your next AI. (If it’s really good, you can even just delegate the entire task to it!) How would this work with really good language modelling? (Maybe I’m just seconding what Ofer said—I’d love to read an entire post about the view you are putting forth here!)
The goal of something like amplification or debate is to create a sort of oracle AI that can answer arbitrary questions (like how to build your next AI) for you. The claim I’m making is just that language is a rich enough environment that it’ll be competitive to only use language as the training data for building your first such system.