The key question I always focus on is: where do you get your capabilities from?
For instance, with GOFAI and ordinary programming, you have some human programmer manually create a model of the scenarios the AI can face, and then manually create a bunch of rules for what to do in order to achieve things. So basically, the human programmer has a bunch of really advanced capabilities, and they use them to manually build some simple capabilities.
“Consequentialism”, broadly defined, represents an alternative class of ways to gain capabilities, namely choosing what to do based on it having the desired consequences. To some extent, this is a method humans uses, perhaps particularly the method the smartest and most autistic humans most use (which I suspect to be connected to LessWrong demographics but who knows...). Utility maximization captures the essence of consequentalism; there are various other things, such as multi-agency that one can throw on top of it, but those other things still mainly derive their capabilities from the core of utility maximization.
Self-supervised language models such as GPT-3 do not gain their capabilities from consequentialism, yet they have advanced capabilities nonetheless. How? Imitation learning, which basically works because of Aumann’s agreement theorem. Self-supervised language models mimic human text, and humans do useful stuff and describe it in text, so self-supervised language models learn the useful stuff that can be described in text.
Risk that arises purely from language models or non-consequentialist RLHF might be quite interesting and important to study. I feel less able to predict it, though, partly because I don’t know what the models will be deployed to do, or how much they can be coerced into doing, or what kinds of witchcraft are necessary to coerce the models into doing those things.
It is possible to me that imitation learning and RLHF can bring us to the frontier of human abilities, so that we have a tool that can solve tasks as well as the best humans can. However, I don’t think it will be able to much exceed that frontier. This is still superhuman, because no human is as good as all the best humans at all the tasks. But it is not far-superhuman, even though I think being far-superhuman is possible, and a key part in it not being far-superhuman is that it cannot extend its capabilities. As such, I would expect consequentialism to be necessary for creating something that is far-superhuman.
I think many of the classical AI risk arguments apply to consequentialist far-superhuman AI.
If I understood your model correctly, GPT has capability because (1) humans are consequentialists so they have capabilities and (2) GPT imitates human output (3) which requires the GPT learning the underlying human capabilities.
GPT is behavior cloning. But it is the behavior of a universe that is cloned, not of a single demonstrator, and the result isn’t a static copy of the universe, but a compression of the universe into a generative rule.
I think the above quote from janus would add to (3) that it requires GPT to also learn the environment and the human-environment interactions, aside from just mimicking human capabilities. I know what you said doesn’t contradict this, but I think there’s a difference in emphasis, i.e. imitation of humans (or some other consequentialist) not necessarily being the main source of capability.
Generalizing this, it seems obviously wrong that imitation-learning-of-consequentialists is where self-supervised language models get their capabilities from? (I strongly suspect I misunderstood your argument or what you meant by capabilities, but just laying out anyways)
Like, LLM-style transformer pretrained on protein sequences get their “protein-prediction capability” purely from “environment generative-rule learning,” and none from imitation learning of a consequentialist’s output.
I think the above quote from janus would add to (3) that it requires GPT to also learn the environment and the human-environment interactions, aside from just mimicking human capabilities. I know what you said doesn’t contradict this, but I think there’s a difference in emphasis, i.e. imitation of humans (or some other consequentialist) not necessarily being the main source of capability.
I think most of the capabilities on earth exist in humans, not in the environment. For instance if you have a rock, it’s just gonna sit there; it’s not gonna make a rocket and fly to the moon. This is why I emphasize GPT as getting its capabilities from humans, since there are not many other things in the environment it could get capabilities from.
I agree that insofar as there are other things in the environment with capabilities (e.g. computers outputting big tables of math results) that get fed into GPT, it also gains some capabilities from them.
Like, LLM-style transformer pretrained on protein sequences get their “protein-prediction capability” purely from “environment generative-rule learning,” and none from imitation learning of a consequentialist’s output.
I think they get their capabilities from evolution, which is a consequentialist optimizer?
It is possible to me that imitation learning and RLHF can bring us to the frontier of human abilities, so that we have a tool that can solve tasks as well as the best humans can. However, I don’t think it will be able to much exceed that frontier. This is still superhuman, because no human is as good as all the best humans at all the tasks. But it is not far-superhuman, even though I think being far-superhuman is possible, and a key part in it not being far-superhuman is that it cannot extend its capabilities. As such, I would expect consequentialism to be necessary for creating something that is far-superhuman.
I disagree especially with this, but I have not yet documented my case against it in a form I’m satisfied with.
That said, I do not endorse your case for how language models gain their capabilities. I don’t think of it as acquiring capabilities humans have.
I think of AI systems as the products of selection.
Consider a cognitive domain/task, and an optimisation process that selects system for performance on that task. In the limit of arbitrarily powerful optimisation pressure what do the systems so selected converge to?
For something like logical tic tac toe, the systems so produced will be very narrow optimisers and pretty weak ones, because very little optimisation power is needed to attain optimal performance on tic tac toe.
What about Go? The systems so produced will also be narrow optimisers, but vastly more powerful, because much more optimisation power is needed to attain optimal performance in Go.
I think the products of optimisation for the task of minimising predictive loss on sufficiently large and diverse datasets (e.g. humanity’s text corpus) converge to general intelligence.
And arbitrarily powerful optimisation pressure would create arbitrarily powerful LLMs.
I expect that LLMs can in principle scale far into the superhuman regime.
I think the products of optimisation for the task of minimising predictive loss on sufficiently large and diverse datasets (e.g. humanity’s text corpus) converge to general intelligence.
Could you expand on what you mean by general intelligence, and how it gets created selected for by the task of minimising predictive loss on sufficiently large and diverse datasets like humanity’s text corpus?
If AI risk arguments mainly apply to consequentialist (which I assume is the same as EU-maximizing in the OP) AI, and the first half of the OP is right that such AI is unlikely to arise naturally, does that make you update against AI risk?
which I assume is the same as EU-maximizing in the OP
Not quite the same, but probably close enough.
You can have non-consequentialist EU maximizers if e.g. the actionspace and statespace is small and someone manually computed a table of the expected utilities. In that case, the consequentialism is in the entity that computed the table of the expected utilities, not the entity that selects an action based on the table.
(Though I suppose such an agent is kind of pointless since you could as well just store a table of the actions to choose.)
You can also have consequentialists that are not EU maximizers if they are e.g. a collection of consequentialist EU maximizers working together.
I don’t think consequentialism is related to utility maximisation in the way you try to present it. There are many consequentialistic agent architectures that are explicitly not utility maximising, e. g. Active Inference, JEPA, ReduNets.
Then you seem to switch your response to discussing that consequentialism is important for reaching the far-superhuman AI level. This looks at least plausible to me, but first, these far-superhuman AIs could have a non-UM consequentialistic agent architecture (see above), and second, DragonGod didn’t say that the risk is necessarily from far-superhuman AIs (even though non-UM ones): I believe he argued for that here. It’s possible even that far-superhuman intelligence is not a thing at all (except for the speed of cognition and the size of memory), but the risks that he highlights: human disempowerment and dystopian scenarios, still absolutely stand.
I don’t think consequentialism is related to utility maximisation in the way you try to present it. There are many consequentialistic agent architectures that are explicitly not utility maximising, e. g. Active Inference, JEPA, ReduNets.
JEPA seems like it is basically utility maximizing to me. What distinction are you referring to?
I keep getting confused about Active Inference (I think I understood it once based on an equivalence to utility maximization, but it’s a while ago and you seem to be saying that this equivalence doesn’t hold), and I’m not familiar with ReduNets, so I would appreciate a link or an explainer to catch up.
Then you seem to switch your response to discussing that consequentialism is important for reaching the far-superhuman AI level. This looks at least plausible to me, but first, these far-superhuman AIs could have a non-UM consequentialistic agent architecture (see above), and second, DragonGod didn’t say that the risk is necessarily from far-superhuman AIs (even though non-UM ones): I believe he argued for that here. It’s possible even that far-superhuman intelligence is not a thing at all (except for the speed of cognition and the size of memory), but the risks that he highlights: human disempowerment and dystopian scenarios, still absolutely stand.
I was sort of addressing alternative risks in this paragraph:
Risk that arises purely from language models or non-consequentialist RLHF might be quite interesting and important to study. I feel less able to predict it, though, partly because I don’t know what the models will be deployed to do, or how much they can be coerced into doing, or what kinds of witchcraft are necessary to coerce the models into doing those things.
The key question I always focus on is: where do you get your capabilities from?
For instance, with GOFAI and ordinary programming, you have some human programmer manually create a model of the scenarios the AI can face, and then manually create a bunch of rules for what to do in order to achieve things. So basically, the human programmer has a bunch of really advanced capabilities, and they use them to manually build some simple capabilities.
“Consequentialism”, broadly defined, represents an alternative class of ways to gain capabilities, namely choosing what to do based on it having the desired consequences. To some extent, this is a method humans uses, perhaps particularly the method the smartest and most autistic humans most use (which I suspect to be connected to LessWrong demographics but who knows...). Utility maximization captures the essence of consequentalism; there are various other things, such as multi-agency that one can throw on top of it, but those other things still mainly derive their capabilities from the core of utility maximization.
Self-supervised language models such as GPT-3 do not gain their capabilities from consequentialism, yet they have advanced capabilities nonetheless. How? Imitation learning, which basically works because of Aumann’s agreement theorem. Self-supervised language models mimic human text, and humans do useful stuff and describe it in text, so self-supervised language models learn the useful stuff that can be described in text.
Risk that arises purely from language models or non-consequentialist RLHF might be quite interesting and important to study. I feel less able to predict it, though, partly because I don’t know what the models will be deployed to do, or how much they can be coerced into doing, or what kinds of witchcraft are necessary to coerce the models into doing those things.
It is possible to me that imitation learning and RLHF can bring us to the frontier of human abilities, so that we have a tool that can solve tasks as well as the best humans can. However, I don’t think it will be able to much exceed that frontier. This is still superhuman, because no human is as good as all the best humans at all the tasks. But it is not far-superhuman, even though I think being far-superhuman is possible, and a key part in it not being far-superhuman is that it cannot extend its capabilities. As such, I would expect consequentialism to be necessary for creating something that is far-superhuman.
I think many of the classical AI risk arguments apply to consequentialist far-superhuman AI.
If I understood your model correctly, GPT has capability because (1) humans are consequentialists so they have capabilities and (2) GPT imitates human output (3) which requires the GPT learning the underlying human capabilities.
simulators:
I think the above quote from janus would add to (3) that it requires GPT to also learn the environment and the human-environment interactions, aside from just mimicking human capabilities. I know what you said doesn’t contradict this, but I think there’s a difference in emphasis, i.e. imitation of humans (or some other consequentialist) not necessarily being the main source of capability.
Generalizing this, it seems obviously wrong that imitation-learning-of-consequentialists is where self-supervised language models get their capabilities from? (I strongly suspect I misunderstood your argument or what you meant by capabilities, but just laying out anyways)
Like, LLM-style transformer pretrained on protein sequences get their “protein-prediction capability” purely from “environment generative-rule learning,” and none from imitation learning of a consequentialist’s output.
I think most of the capabilities on earth exist in humans, not in the environment. For instance if you have a rock, it’s just gonna sit there; it’s not gonna make a rocket and fly to the moon. This is why I emphasize GPT as getting its capabilities from humans, since there are not many other things in the environment it could get capabilities from.
I agree that insofar as there are other things in the environment with capabilities (e.g. computers outputting big tables of math results) that get fed into GPT, it also gains some capabilities from them.
I think they get their capabilities from evolution, which is a consequentialist optimizer?
I disagree especially with this, but I have not yet documented my case against it in a form I’m satisfied with.
That said, I do not endorse your case for how language models gain their capabilities. I don’t think of it as acquiring capabilities humans have.
I think of AI systems as the products of selection.
Consider a cognitive domain/task, and an optimisation process that selects system for performance on that task. In the limit of arbitrarily powerful optimisation pressure what do the systems so selected converge to?
For something like logical tic tac toe, the systems so produced will be very narrow optimisers and pretty weak ones, because very little optimisation power is needed to attain optimal performance on tic tac toe.
What about Go? The systems so produced will also be narrow optimisers, but vastly more powerful, because much more optimisation power is needed to attain optimal performance in Go.
I think the products of optimisation for the task of minimising predictive loss on sufficiently large and diverse datasets (e.g. humanity’s text corpus) converge to general intelligence.
And arbitrarily powerful optimisation pressure would create arbitrarily powerful LLMs.
I expect that LLMs can in principle scale far into the superhuman regime.
Could you expand on what you mean by general intelligence, and how it gets created selected for by the task of minimising predictive loss on sufficiently large and diverse datasets like humanity’s text corpus?
This is the part I’ve not yet written up in a form I endorse.
I’ll try to get it done before the end of the year.
Expanded: Where do you get your capabilities from?
If AI risk arguments mainly apply to consequentialist (which I assume is the same as EU-maximizing in the OP) AI, and the first half of the OP is right that such AI is unlikely to arise naturally, does that make you update against AI risk?
Yes
Not quite the same, but probably close enough.
You can have non-consequentialist EU maximizers if e.g. the actionspace and statespace is small and someone manually computed a table of the expected utilities. In that case, the consequentialism is in the entity that computed the table of the expected utilities, not the entity that selects an action based on the table.
(Though I suppose such an agent is kind of pointless since you could as well just store a table of the actions to choose.)
You can also have consequentialists that are not EU maximizers if they are e.g. a collection of consequentialist EU maximizers working together.
I don’t think consequentialism is related to utility maximisation in the way you try to present it. There are many consequentialistic agent architectures that are explicitly not utility maximising, e. g. Active Inference, JEPA, ReduNets.
Then you seem to switch your response to discussing that consequentialism is important for reaching the far-superhuman AI level. This looks at least plausible to me, but first, these far-superhuman AIs could have a non-UM consequentialistic agent architecture (see above), and second, DragonGod didn’t say that the risk is necessarily from far-superhuman AIs (even though non-UM ones): I believe he argued for that here. It’s possible even that far-superhuman intelligence is not a thing at all (except for the speed of cognition and the size of memory), but the risks that he highlights: human disempowerment and dystopian scenarios, still absolutely stand.
JEPA seems like it is basically utility maximizing to me. What distinction are you referring to?
I keep getting confused about Active Inference (I think I understood it once based on an equivalence to utility maximization, but it’s a while ago and you seem to be saying that this equivalence doesn’t hold), and I’m not familiar with ReduNets, so I would appreciate a link or an explainer to catch up.
I was sort of addressing alternative risks in this paragraph: