Great job writing up your thoughts, insights, and model!
My mind is mainly attracted to the distinction you make between capabilities and agency. In my own model, agency is a necessary part of increasing capabilities, and will per definition emerge in superhuman intelligence. I think the same conclusion follows from the definitions you use as follows:
Intelligence measures an agent’s ability to achieve goals in a wide range of environments
You define “agency” as:
if it plans and executes actions in order to achieve goals
Thus an entity with higher intelligence achieves goals in a wider range of environments. What would it take to achieve goals in a wider range of environments? The ability to plan.
There is a ceiling where environments become so complex that 1-step actions will not work to achieve one’s goals. Planning is the mental act of stringing multiple action steps together to achieve goals. Thus planning is a key part of intelligence. It’s the scaling up of one of the parameters of intelligence: The step-size of your action sequences. Therefore it follows that no superhuman intelligence will exist without including agency, by definition of what superhuman intelligence is. And thus AGI risk is about agentic AI that is better at achieving goals in a wider range of environments than we can. And it will generate plans that we can’t understand, cause it will literally be “more” agentic than us, in as far as one can use that term to indicate degree by referring to the complexity of the plans one can consider and generate.
I’m also going to take a shot at your questions as a way to test myself. I’m new to this, so forgive me if my answers are off base. Hopefully other people will pitch in and push in other directions if that’s the case, and I’ll learn more in the process.
In order to expect real world consequences because of instrumental convergence:
How strong and general are the required capabilities?
Unsure. You raise an interesting point that I tend to mostly worry about superhuman intelligence cause that intelligence can per definition beat us at everything. Your question makes me wonder if we are already in trouble before that point. Then again, on the practical side, Presumably the problem happens somewhere between “the smartest animal we know” and “our intelligence”, and once we are near that, recursive self-improvement will make the distinction moot, I’m guessing (as AGI will shoot past our intelligence level in no time). So, I don’t know … this is part of my motivation to just work on this stuff ASAP, just in case the point is sooner than we might imagine.
How much agency is required?
I don’t think splitting of agency is a relevant variable in this case, as soon above. In my model, you are simply asking “how many steps should agents have in their plans before they cause trouble”, which is a proxy for “how smart do they need to be to cause trouble”, and thus the same as the question before this.
How much knowledge about the environment and real world is required?
If you tell them about the world as a hypothetical, and they engage in that hypothetical as if you are indeed the person you say you are in that hypothetical, then they can already wreak havoc by “pretending” to implement what they would do in this hypothetical and apply social manipulation to you such that you set off a destructive chain of actions in the world. So, very little. And any amount of information they would need to be useful to us, would be enough for it to also hurt us.
How likely are emergent sub-agents in non-agentic systems?
See answer 2 questions up.
How quickly and how far will capabilities of AI systems improve in the foreseeable future (next few months / years / decades)?
I don’t know. I’ve been looking at Steinhardt’s material on forecasting and now have Superforecasting on my reading list. But then I realized, why even bother? Personally I consider a 10% risk of superhuman intelligence being created in the next 150 years to be too much. I think I read somewhere that 80% of ML researchers (not specific to AIS or LW) think AGI will happen in the next 100 years or so. I should probably log that source somewhere, in case I’m misquoting. Anyway, unless your action plans are different based on specific thresholds, I think spending too much time forecasting is a sort of emotional red herring that drains resources that could go in to actual alignment work. Not saying you have actually done this, but instead, I’m trying to warn against people going in that direction.
How strong are economic incentives to build agentic AI systems?
What are concrete economically valuable tasks that are more easily solved by agentic approaches than by (un)supervised learning?
Very strong: Solve the climate change crisis. Bioengineer better crops with better yields. Do automated research that actually invents more efficient batteries, vehicle designs, supercomputers, etc. The list is endless. Intelligence is the single most valuable asset a company can develop, cause it solve problems, and solving problems makes money.
How hard is it to design increasingly intelligent systems?
How (un)likely is an intelligence explosion?
Not sure… I am interested in learning more about what techniques we can use to ensure a slow take-off. My intuitions are currently that if we don’t try really hard, a fast take-off will be the default, cause an intelligent system will converge toward recursive self-improvement as per Instrumentally Convergent goal of Cognitive Enhancement.
Apart from these rather concrete and partly also somewhat empirical questions I also want to state a few more general and also maybe more confused questions:
How high-dimensional is intelligence? How much sense does it make to speak of general intelligence?
I love this question. I do think intelligence has many subcomponents that science hasn’t teased apart entirely yet. I’d be curious to learn more about what we do know. Additionally, I have a big worry around what new emergent capabilities a superhuman intelligence may have. This is analoguous to our ability to plan, to have consciousness, to use language, etc. At some point between single-celled organism and homo sapience, these abilities just emerged from scaling up computational power (I think? I’m curious about counterarguments around architecture or other aspects mattering or being crucial as well)
How independent are capabilities and agency? Can you have arbitrarily capable non-agentic systems?
See above answers.
How big is the type of risk explained in this post compared to more fuzzy AI risk related to general risk from Moloch, optimization, and competition?
How do these two problems differ in scale, tractability, neglectedness?
Not sure I understand the distinction entirely … I’d need more words to understand how these are separate problems. I can think of different ways to interpret the question.
What does non-existential risk from AI look like? How big of a problem is it? How tractable? How neglected?
My own intuition is actually that pure S-risk is quite low/negligible. Suffering is a waste of energy, basically inefficient, toward any goal except itself. It’s like an anti-convergent goal, really. Of course, we do generate a lot of suffering while trying to achieve our goals, but these are inefficiencies. E.g., if you think of the bio-industry. If we created lab meat grown from single cells, there would be no suffering, and this would presumably be far more efficient. Thus any S-risks from AGI will be quite momentary. Once it figures out how to do what it really wants to do without wasting resources/energy making us suffer, it will just do that. Of course there is a tiny sliver of probability that it will take our suffering as it’s primary goal, but I think this is ridiculously unlikely, and something our minds get glued on to due to that bias which makes you focus on the most absolutely horrible option no matter the probability (forgot what it’s called).
Thanks for your replies. I think our intuitions regarding intelligence and agency are quite different. I deliberately mostly stickest to the word ‘capabilities’, because in my intuition you can have systems with very strong and quite general capabilities, that are not agentic.
One very interesting point is that you : “Presumably the problem happens somewhere between “the smartest animal we know” and “our intelligence”, and once we are near that, recursive self-improvement will make the distinction moot”. Can you explain this position more? In my intuition building and improving intelligent systems is far harder than that.
I hope to later come back to your answer to information about the real world.
What distinguishes capabilities and intelligence to your mind, and what grounds that distinction? I think I’d have to understand that to begin to formulate an answer.
I’ve unfortunately been quite distracted, but better a late reply than no reply. With capabilities I mean how well a system accomplishes different tasks. This is potentially high dimensional (there can be many tasks that two systems are not equally good at). Also it can be more and less general (optical character recognition is very narrow because it can only be used for one thing, generating / predicting text is quite general). Also, systems without agency can have strong and general capabilities (a system might generate text or images without being agentic).
This is quite different from the definition by Legg and Hutter, which is more specific to agents. However, since last week I have updated on strongly and generally capable non-agentic systems being less likely to actually be built (especially before agentic systems). In consequence, the difference between my notion of capabilities and a more agent related notion of intelligence is less important than I thought.
Great job writing up your thoughts, insights, and model!
My mind is mainly attracted to the distinction you make between capabilities and agency. In my own model, agency is a necessary part of increasing capabilities, and will per definition emerge in superhuman intelligence. I think the same conclusion follows from the definitions you use as follows:
You define “capabilities” by the Legg and Hutter definition you linked to, which reads:
You define “agency” as:
Thus an entity with higher intelligence achieves goals in a wider range of environments. What would it take to achieve goals in a wider range of environments? The ability to plan.
There is a ceiling where environments become so complex that 1-step actions will not work to achieve one’s goals. Planning is the mental act of stringing multiple action steps together to achieve goals. Thus planning is a key part of intelligence. It’s the scaling up of one of the parameters of intelligence: The step-size of your action sequences. Therefore it follows that no superhuman intelligence will exist without including agency, by definition of what superhuman intelligence is. And thus AGI risk is about agentic AI that is better at achieving goals in a wider range of environments than we can. And it will generate plans that we can’t understand, cause it will literally be “more” agentic than us, in as far as one can use that term to indicate degree by referring to the complexity of the plans one can consider and generate.
I’m also going to take a shot at your questions as a way to test myself. I’m new to this, so forgive me if my answers are off base. Hopefully other people will pitch in and push in other directions if that’s the case, and I’ll learn more in the process.
Unsure. You raise an interesting point that I tend to mostly worry about superhuman intelligence cause that intelligence can per definition beat us at everything. Your question makes me wonder if we are already in trouble before that point. Then again, on the practical side, Presumably the problem happens somewhere between “the smartest animal we know” and “our intelligence”, and once we are near that, recursive self-improvement will make the distinction moot, I’m guessing (as AGI will shoot past our intelligence level in no time). So, I don’t know … this is part of my motivation to just work on this stuff ASAP, just in case the point is sooner than we might imagine.
I don’t think splitting of agency is a relevant variable in this case, as soon above. In my model, you are simply asking “how many steps should agents have in their plans before they cause trouble”, which is a proxy for “how smart do they need to be to cause trouble”, and thus the same as the question before this.
If you tell them about the world as a hypothetical, and they engage in that hypothetical as if you are indeed the person you say you are in that hypothetical, then they can already wreak havoc by “pretending” to implement what they would do in this hypothetical and apply social manipulation to you such that you set off a destructive chain of actions in the world. So, very little. And any amount of information they would need to be useful to us, would be enough for it to also hurt us.
See answer 2 questions up.
I don’t know. I’ve been looking at Steinhardt’s material on forecasting and now have Superforecasting on my reading list. But then I realized, why even bother? Personally I consider a 10% risk of superhuman intelligence being created in the next 150 years to be too much. I think I read somewhere that 80% of ML researchers (not specific to AIS or LW) think AGI will happen in the next 100 years or so. I should probably log that source somewhere, in case I’m misquoting. Anyway, unless your action plans are different based on specific thresholds, I think spending too much time forecasting is a sort of emotional red herring that drains resources that could go in to actual alignment work. Not saying you have actually done this, but instead, I’m trying to warn against people going in that direction.
Very strong: Solve the climate change crisis. Bioengineer better crops with better yields. Do automated research that actually invents more efficient batteries, vehicle designs, supercomputers, etc. The list is endless. Intelligence is the single most valuable asset a company can develop, cause it solve problems, and solving problems makes money.
Not sure… I am interested in learning more about what techniques we can use to ensure a slow take-off. My intuitions are currently that if we don’t try really hard, a fast take-off will be the default, cause an intelligent system will converge toward recursive self-improvement as per Instrumentally Convergent goal of Cognitive Enhancement.
I love this question. I do think intelligence has many subcomponents that science hasn’t teased apart entirely yet. I’d be curious to learn more about what we do know. Additionally, I have a big worry around what new emergent capabilities a superhuman intelligence may have. This is analoguous to our ability to plan, to have consciousness, to use language, etc. At some point between single-celled organism and homo sapience, these abilities just emerged from scaling up computational power (I think? I’m curious about counterarguments around architecture or other aspects mattering or being crucial as well)
See above answers.
Not sure I understand the distinction entirely … I’d need more words to understand how these are separate problems. I can think of different ways to interpret the question.
My own intuition is actually that pure S-risk is quite low/negligible. Suffering is a waste of energy, basically inefficient, toward any goal except itself. It’s like an anti-convergent goal, really. Of course, we do generate a lot of suffering while trying to achieve our goals, but these are inefficiencies. E.g., if you think of the bio-industry. If we created lab meat grown from single cells, there would be no suffering, and this would presumably be far more efficient. Thus any S-risks from AGI will be quite momentary. Once it figures out how to do what it really wants to do without wasting resources/energy making us suffer, it will just do that. Of course there is a tiny sliver of probability that it will take our suffering as it’s primary goal, but I think this is ridiculously unlikely, and something our minds get glued on to due to that bias which makes you focus on the most absolutely horrible option no matter the probability (forgot what it’s called).
Thanks for your replies. I think our intuitions regarding intelligence and agency are quite different. I deliberately mostly stickest to the word ‘capabilities’, because in my intuition you can have systems with very strong and quite general capabilities, that are not agentic.
One very interesting point is that you : “Presumably the problem happens somewhere between “the smartest animal we know” and “our intelligence”, and once we are near that, recursive self-improvement will make the distinction moot”. Can you explain this position more? In my intuition building and improving intelligent systems is far harder than that.
I hope to later come back to your answer to information about the real world.
What distinguishes capabilities and intelligence to your mind, and what grounds that distinction? I think I’d have to understand that to begin to formulate an answer.
I’ve unfortunately been quite distracted, but better a late reply than no reply.
With capabilities I mean how well a system accomplishes different tasks. This is potentially high dimensional (there can be many tasks that two systems are not equally good at). Also it can be more and less general (optical character recognition is very narrow because it can only be used for one thing, generating / predicting text is quite general). Also, systems without agency can have strong and general capabilities (a system might generate text or images without being agentic).
This is quite different from the definition by Legg and Hutter, which is more specific to agents. However, since last week I have updated on strongly and generally capable non-agentic systems being less likely to actually be built (especially before agentic systems). In consequence, the difference between my notion of capabilities and a more agent related notion of intelligence is less important than I thought.