The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
Related to this, perhaps the outer loss of the markets isn’t nearly as dispensable as a lot of people on LW believe, and contact with reality is a necessary part of all future AIs.
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied, even charitably assuming a restriction on the utopia concept to be relative to someone else’s values.
The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
going by the linked tweet, does “incentive problem” mean “needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled”? and contrasted with a “command economy”, where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?
if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:
it could have many copies of itself talk to them
it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
it could mind-scan those who are okay with this.
(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)
(i think these also sufficiently address your point 2, about SI needing ‘contact with reality’)
there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
The other issue is value conflicts, which I expect to be mostly irresolvable in a satisfying way by default due to moral subjectivism combined with me believing that lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
That’s why you can’t have utopia for everyone
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
in the world of your premise (with people using superintelligence to then war over value differences), superintelligence, not nations, would be the most powerful thing (with which) to do conflict
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
I think the main point is that what’s worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it’s own.
(Note when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do, which is actually important.)
On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.
Also, it’s worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.
Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don’t have nearly the confidence that you do that conflict is always avoidable by ASIs by default.
you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
people will program their superintelligences
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
Nitpick that doesn’t matter, but when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
The key features here in this future is that the superhuman equals optimal assumption is false [...]
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
still confused about this btw. in my second reply to you i wrote:
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”?)
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
I was assuming very strongly superhumanly intelligent AI
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
I think the offense-defense balance moderately favors defense even at optimality
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread starte
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
why?
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
My new point is that I don’t think optimal agents will exist when we lose all control
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
your vision definitely requires other real-life value sets to lose out on a lot, here.
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
far too many people tend to deny that you do in fact have to make other values lose out
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.
The basic answer is the following:
The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
https://x.com/MatthewJBar/status/1871640396583030806
Related to this, perhaps the outer loss of the markets isn’t nearly as dispensable as a lot of people on LW believe, and contact with reality is a necessary part of all future AIs.
More here:
https://gwern.net/backstop
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied, even charitably assuming a restriction on the utopia concept to be relative to someone else’s values.
going by the linked tweet, does “incentive problem” mean “needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled”? and contrasted with a “command economy”, where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?
if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:
it could have many copies of itself talk to them
it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
it could mind-scan those who are okay with this.
(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)
(i think these also sufficiently address your point 2, about SI needing ‘contact with reality’)
there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
(if morally acceptable, e.g. no creating hells)
The other issue is value conflicts, which I expect to be mostly irresolvable in a satisfying way by default due to moral subjectivism combined with me believing that lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
That’s why you can’t have utopia for everyone.
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
some exceptions like “it is aligned, but has the wrong decision theory, and gets acausally blackmailed”
in the world of your premise (with people using superintelligence to then war over value differences), superintelligence, not nations, would be the most powerful thing (with which) to do conflict
I think the main point is that what’s worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it’s own.
(Note when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do, which is actually important.)
On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.
Also, it’s worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:
https://www.lesswrong.com/posts/895Qmhyud2PjDhte6/responses-to-apparent-rationalist-confusions-about-game
https://www.lesswrong.com/posts/HFYivcm6WS4fuqtsc/dath-ilan-vs-sid-meier-s-alpha-centauri-pareto-improvements#jpCmhofRBXAW55jZv
I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.
Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don’t have nearly the confidence that you do that conflict is always avoidable by ASIs by default.
you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
you wrote:
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
Edited my comment.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence, for a combination of reasons
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
still confused about this btw. in my second reply to you i wrote:
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
why?
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
(e.g. the ‘going out with a whimper’ part of this post)
i missed this part:
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.