you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
people will program their superintelligences
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
Nitpick that doesn’t matter, but when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
The key features here in this future is that the superhuman equals optimal assumption is false [...]
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
still confused about this btw. in my second reply to you i wrote:
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”?)
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
I was assuming very strongly superhumanly intelligent AI
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
I think the offense-defense balance moderately favors defense even at optimality
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread starte
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
why?
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
My new point is that I don’t think optimal agents will exist when we lose all control
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
your vision definitely requires other real-life value sets to lose out on a lot, here.
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
far too many people tend to deny that you do in fact have to make other values lose out
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.
you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
you wrote:
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
Edited my comment.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence, for a combination of reasons
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
still confused about this btw. in my second reply to you i wrote:
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
why?
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
(e.g. the ‘going out with a whimper’ part of this post)
i missed this part:
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.