(edit 3: i’m not sure, but this text might be net-harmful to discourse)
i continue to feel so confused at what continuity led to some users of this forum asking questions like, “what effect will superintelligence have on the economy?” or otherwise expecting an economic ecosystem of superintelligences (e.g. 1[1], 2 (edit 2: I misinterpreted this question)).
it actually reminds me of this short story by davidad, in which one researcher on an alignment team has been offline for 3 months, and comes back to find the others on the team saying things like “[Coherent Extrapolated Volition?] Yeah, exactly! Our latest model is constantly talking about how coherent he is. And how coherent his volitions are!”, in that it’s something i thought this forum would have seen as ‘confused about the basics’ just a year ago, and i don’t yet understand what led to it.
(edit: i’m feeling conflicted about this shortform after seeing it upvoted this much. the above paragraph would be unsubstantive/bad discourse if read as an argument by analogy, which i’m worried it was (?). i was mainly trying to express confusion.)
I keep trying to explain to people that the archetype of intelligence is not Dustin Hoffman in Rain Man. It is a human being, period. It is squishy things that explode in a vacuum, leaving footprints on their moon. Within that gray wet lump is the power to search paths through the great web of causality, and find a road to the seemingly impossible—the power sometimes called creativity.
People—venture capitalists in particular—sometimes ask how, if the Machine Intelligence Research Institute successfully builds a true AI, the results will be commercialized. This is what we call a framing problem. [...]
a value-aligned superintelligence directly creates utopia. an “intent-aligned” or otherwise non-agentic truthful superintelligence, if that were to happen, is most usefully used to directly tell you how to create a value-aligned agentic superintelligence. if the thing in question cannot do one of these things it is not superintelligence, but something else.
As far as I know, my post started the recent trend you complain about.
Several commenters on this thread (e.g. @Lucius Bushnaqhere and @MondSemmelhere) mention LessWrong’s growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash, the bulk of which is about summarising Yudkowsky’s worldview.
i continue to feel so confused at what continuity led to some users of this forum asking questions like, “what effect will superintelligence have on the economy?” or otherwise expecting an economic ecosystem of superintelligences
If there’s decision-making about scarce resources, you will have an economy. Even superintelligence does not necessarily imply infinite abundance of everything, starting with the reason that our universe only has so many atoms. Multipolar outcomes seem plausible under continuous takeoff, which the consensus view in AI safety (as I understand it) sees as more likely than fast takeoff. I admit that there are strong reasons for thinking that the aggregate of a bunch of sufficiently smart things is agentic, but this isn’t directly relevant for the concerns about humans within the system in my post.
a value-aligned superintelligence directly creates utopia
[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.
Peter Thiel might call this “indefinite optimism”: delay all planning or visualisation because there’s some later point where it’s trusted things will all sort themselves out. Now, if you think that takeoff will definitely be extremely hard and the resulting superintelligence will effortlessly take over the world, then obviously it makes sense to focus on what that superintelligence will want to do. But what if takeoff lasts months or years or decades? (Note that there can be lots of change even within months if the stakes look extreme to powerful actors!) Aren’t you curious about what an aligned superintelligence will end up deciding about society and humans? Are you so sure about the transition period being so short and the superintelligence being so unitary and multipolar outcomes being so unlikely that we’ll never have to worry about problems downstream of the incentive issues and competitive pressures that I discuss (which Beren recently had an excellent post on)? Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond “a singleton is in charge and everything is utopia”?
The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that’s what it was designed to care about.
So I indeed don’t get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn’t need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.
@L Rudolf L can talk on his own, but for me, a crux probably is I don’t expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.
For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I’ve become more bullish on AI control working out since o3, and I’ve come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.
For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.
For AI control, one update I’ve made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.
For the singleton point, it’s probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.
For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don’t really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)
For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world?
I didn’t mean that there’s only one aligned mind design, merely that almost all (99.999999...%) conceivable mind designs are unaligned by default, so the only way to survive is if the first AGI is designed to be aligned, there’s no hope that a random AGI just happens to be aligned. And since we’re heading for the latter scenario, it would be very surprising to me if we managed to design a partially aligned AGI and lose that way.
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
I expect the people in power are worrying about this way more than they worry about the overwhelming difficulty of building an aligned AGI in the first place. (Case in point: the manufactured AI race with China.) As a result I expect they’ll succeed at building a by-default-unaligned AGI and driving themselves and us to extinction. So I’m not worried about instead ending up in a dystopia ruled by some government or AI lab owner.
Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond “a singleton is in charge and everything is utopia”?
End points are easier to infer than trajectories, so sure, I think there’s some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow.
For example, I think it’s a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we’re not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we’re doing right now is like building an abacus out of supercomputers. I can’t imagine any mature civilisation would stick with this.
It’s not that I refuse to speculate about how a world post aligned superintelligence might look. I just didn’t think that your guess was very plausible. I don’t think pre-existing property rights or state structures would matter very much in such a world, even if we don’t get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and get what you want. The same goes for pre-existing military or legal power.
Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don’t think it’s like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don’t think this was ever defensible:
[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.
Another issue is the Eternal September issue where LW membership has grown a ton due to the AI boom (see the LW site metrics in the recent fundraiser post), so as one might expect, most new users haven’t read the old stuff on the site. There are various ways in which the LW team tries to encourage them to read those, but nevertheless.
The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
Related to this, perhaps the outer loss of the markets isn’t nearly as dispensable as a lot of people on LW believe, and contact with reality is a necessary part of all future AIs.
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied, even charitably assuming a restriction on the utopia concept to be relative to someone else’s values.
The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
going by the linked tweet, does “incentive problem” mean “needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled”? and contrasted with a “command economy”, where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?
if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:
it could have many copies of itself talk to them
it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
it could mind-scan those who are okay with this.
(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)
(i think these also sufficiently address your point 2, about SI needing ‘contact with reality’)
there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
The other issue is value conflicts, which I expect to be mostly irresolvable in a satisfying way by default due to moral subjectivism combined with me believing that lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
That’s why you can’t have utopia for everyone
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
in the world of your premise (with people using superintelligence to then war over value differences), superintelligence, not nations, would be the most powerful thing (with which) to do conflict
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
I think the main point is that what’s worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it’s own.
(Note when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do, which is actually important.)
On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.
Also, it’s worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.
Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don’t have nearly the confidence that you do that conflict is always avoidable by ASIs by default.
you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
people will program their superintelligences
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
Nitpick that doesn’t matter, but when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
The key features here in this future is that the superhuman equals optimal assumption is false [...]
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
still confused about this btw. in my second reply to you i wrote:
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”?)
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
I was assuming very strongly superhumanly intelligent AI
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
I think the offense-defense balance moderately favors defense even at optimality
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread starte
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
why?
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
My new point is that I don’t think optimal agents will exist when we lose all control
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
your vision definitely requires other real-life value sets to lose out on a lot, here.
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
far too many people tend to deny that you do in fact have to make other values lose out
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.
My guess is that it’s just an effect of field growth. A lot of people coming in now weren’t around when the consensus formed and don’t agree with it or don’t even know much about it.
Also, the consensus wasn’t exactly uncontroversial on LW even way back in the day. Hanson’s Ems inhabit a somewhat more recognisable world and economy that doesn’t have superintelligence in it, and lots of skeptics used to be skeptical in the sense of thinking all of this AI stuff was way too speculative and wouldn’t happen for hundreds of years if ever, so they made critiques of that form or just didn’t engage in AI discussions at all. LW wasn’t anywhere near this AI-centric when I started reading it around 2010.
My question specifically asks about the transition to ASI, which, while I think it’s really hard to predict, seems likely to take years, during which time we have intelligences just a bit above human level, before they’re truly world-changingly superintelligent. I understand this isn’t everyone’s model, and it’s not necessarily mine, but I think it is plausible.
Asking “how could someone ask such a dumb question?” is a great way to ensure they leave the community. (Maybe you think that’s a good thing?)
I’m fine. Don’t worry to much about this. It just made me think, what am I doing here? For someone to single out my question and say “it’s dumb to even ask such a thing” (and the community apparently agrees)… I just think I’ll be better off not spending time here.
I’d guess that most just skimmed what was visible from the hoverover, while under the impression it was what my text said. The engagement on your post itself is probably more representative.
I guess part of the issue is that in any discussion, people don’t use the same terms in the same way. Some people call present-day AI capabilities by terms like “superintelligent” in a specific domain. Which is not how I understand the term, but I understand where the idea to call it that comes from. But of course such mismatched definitions make discussions really hard. Seeing stuff like that makes it very understandable why Yudkowsky wrote the LW Sequences...
Anyway, here is an example of a recent shortform post which grapples with the same issue that vague terms are confusing.
I feel like this is a bit incorrect. There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy without immediately exploding into an utopian (or dystopian) singularity. The question is whether we are liable to build such things before we build the exploding singularity kind, or if the latter is in some sense easier to build and thus stumble upon first. Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.
There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy
sure, e.g. i think (<- i may be wrong about what the average human can do) that GPT-4 meets this definition (far superhuman at predicting author characteristics, above-average-human at most other abstract things). that’s a totally different meaning.
Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.
do you mean they believe superintelligence (the singularity-creating kind) is impossible, and so don’t also expect it to come after? it’s not sufficient for less capable AIs to defaultly come before superintelligence.
I think some believe it’s downright impossible and others that we’ll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we’ll make a sort of magical genie still bound by us expressing our wishes.
(edit 3: i’m not sure, but this text might be net-harmful to discourse)
i continue to feel so confused at what continuity led to some users of this forum asking questions like, “what effect will superintelligence have on the economy?” or otherwise expecting an economic ecosystem of superintelligences (e.g. 1[1],
2(edit 2: I misinterpreted this question)).it actually reminds me of this short story by davidad, in which one researcher on an alignment team has been offline for 3 months, and comes back to find the others on the team saying things like “[Coherent Extrapolated Volition?] Yeah, exactly! Our latest model is constantly talking about how coherent he is. And how coherent his volitions are!”, in that it’s something i thought this forum would have seen as ‘confused about the basics’ just a year ago, and i don’t yet understand what led to it.
(edit: i’m feeling conflicted about this shortform after seeing it upvoted this much. the above paragraph would be unsubstantive/bad discourse if read as an argument by analogy, which i’m worried it was (?). i was mainly trying to express confusion.)
from the power of intelligence (actually, i want to quote the entire post, it’s short):
a value-aligned superintelligence directly creates utopia. an “intent-aligned” or otherwise non-agentic truthful superintelligence, if that were to happen, is most usefully used to directly tell you how to create a value-aligned agentic superintelligence. if the thing in question cannot do one of these things it is not superintelligence, but something else.
comment thread between me and the post’s author
People are confused about the basics because the basics are insufficiently justified.
As far as I know, my post started the recent trend you complain about.
Several commenters on this thread (e.g. @Lucius Bushnaq here and @MondSemmel here) mention LessWrong’s growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash, the bulk of which is about summarising Yudkowsky’s worldview.
If there’s decision-making about scarce resources, you will have an economy. Even superintelligence does not necessarily imply infinite abundance of everything, starting with the reason that our universe only has so many atoms. Multipolar outcomes seem plausible under continuous takeoff, which the consensus view in AI safety (as I understand it) sees as more likely than fast takeoff. I admit that there are strong reasons for thinking that the aggregate of a bunch of sufficiently smart things is agentic, but this isn’t directly relevant for the concerns about humans within the system in my post.
In his review of Peter Singer’s commentary on Marx, Scott Alexander writes:
Peter Thiel might call this “indefinite optimism”: delay all planning or visualisation because there’s some later point where it’s trusted things will all sort themselves out. Now, if you think that takeoff will definitely be extremely hard and the resulting superintelligence will effortlessly take over the world, then obviously it makes sense to focus on what that superintelligence will want to do. But what if takeoff lasts months or years or decades? (Note that there can be lots of change even within months if the stakes look extreme to powerful actors!) Aren’t you curious about what an aligned superintelligence will end up deciding about society and humans? Are you so sure about the transition period being so short and the superintelligence being so unitary and multipolar outcomes being so unlikely that we’ll never have to worry about problems downstream of the incentive issues and competitive pressures that I discuss (which Beren recently had an excellent post on)? Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond “a singleton is in charge and everything is utopia”?
The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that’s what it was designed to care about.
So I indeed don’t get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn’t need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.
@L Rudolf L can talk on his own, but for me, a crux probably is I don’t expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.
For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I’ve become more bullish on AI control working out since o3, and I’ve come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.
For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.
For AI control, one update I’ve made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.
For the singleton point, it’s probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.
For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don’t really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)
For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
Relatedly, are we sure that CEV is computable?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
I didn’t mean that there’s only one aligned mind design, merely that almost all (99.999999...%) conceivable mind designs are unaligned by default, so the only way to survive is if the first AGI is designed to be aligned, there’s no hope that a random AGI just happens to be aligned. And since we’re heading for the latter scenario, it would be very surprising to me if we managed to design a partially aligned AGI and lose that way.
I expect the people in power are worrying about this way more than they worry about the overwhelming difficulty of building an aligned AGI in the first place. (Case in point: the manufactured AI race with China.) As a result I expect they’ll succeed at building a by-default-unaligned AGI and driving themselves and us to extinction. So I’m not worried about instead ending up in a dystopia ruled by some government or AI lab owner.
End points are easier to infer than trajectories, so sure, I think there’s some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow.
For example, I think it’s a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we’re not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we’re doing right now is like building an abacus out of supercomputers. I can’t imagine any mature civilisation would stick with this.
It’s not that I refuse to speculate about how a world post aligned superintelligence might look. I just didn’t think that your guess was very plausible. I don’t think pre-existing property rights or state structures would matter very much in such a world, even if we don’t get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and get what you want. The same goes for pre-existing military or legal power.
Well, the conserved quantum numbers of my room, really.
Assuming that which end point you get to doesn’t depend on the intermediate trajectories at least.
Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don’t think it’s like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don’t think this was ever defensible:
Another issue is the Eternal September issue where LW membership has grown a ton due to the AI boom (see the LW site metrics in the recent fundraiser post), so as one might expect, most new users haven’t read the old stuff on the site. There are various ways in which the LW team tries to encourage them to read those, but nevertheless.
The basic answer is the following:
The incentive problem still remains, such that it’s more effective to use the price system than to use a command economy to deal with incentive issues:
https://x.com/MatthewJBar/status/1871640396583030806
Related to this, perhaps the outer loss of the markets isn’t nearly as dispensable as a lot of people on LW believe, and contact with reality is a necessary part of all future AIs.
More here:
https://gwern.net/backstop
A potentially large crux is I don’t really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied, even charitably assuming a restriction on the utopia concept to be relative to someone else’s values.
going by the linked tweet, does “incentive problem” mean “needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled”? and contrasted with a “command economy”, where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?
if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:
it could have many copies of itself talk to them
it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
it could mind-scan those who are okay with this.
(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)
(i think these also sufficiently address your point 2, about SI needing ‘contact with reality’)
there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).
i interpret this to mean “some entities’ values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled”. this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.
superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn’t expect (e.g.: “tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first”), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.
(if morally acceptable, e.g. no creating hells)
The other issue is value conflicts, which I expect to be mostly irresolvable in a satisfying way by default due to moral subjectivism combined with me believing that lots of value conflicts today are mostly suppressed because people can’t make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.
That’s why you can’t have utopia for everyone.
i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they’d be worse than other possibilities. (more generally, “an aligned superintelligence would cause some bad-to-it thing” is contradictory, usually[1].)
(i wonder if you’re using the term ‘superintelligence’ in a different way though, e.g. to mean “merely super-human”? to be clear i definitionally mean it in the sense of optimal)
(tangentially: the ‘nations’ framing confuses me)[2]
i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like “there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)” instead of “beings with different preferences going to war with each other” (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i’ll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))
some exceptions like “it is aligned, but has the wrong decision theory, and gets acausally blackmailed”
in the world of your premise (with people using superintelligence to then war over value differences), superintelligence, not nations, would be the most powerful thing (with which) to do conflict
I think the main point is that what’s worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it’s own.
(Note when I focus on superintelligence, I don’t focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do, which is actually important.)
On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.
Also, it’s worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:
https://www.lesswrong.com/posts/895Qmhyud2PjDhte6/responses-to-apparent-rationalist-confusions-about-game
https://www.lesswrong.com/posts/HFYivcm6WS4fuqtsc/dath-ilan-vs-sid-meier-s-alpha-centauri-pareto-improvements#jpCmhofRBXAW55jZv
I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.
Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don’t have nearly the confidence that you do that conflict is always avoidable by ASIs by default.
you didn’t write “yes, i use ‘superintelligent’ to mean super-human”, so i’ll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i’m noting it.
i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.
in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it’s plausible to me that they would fight in some way, though also plausible that they’d find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).
in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth’s circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period’s duration> before the next one would have been created, it will prevent that next one’s creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.
this seems like a crux, and i don’t yet know why you expect otherwise, failing meaning something else by superintelligence.
actually, i can see room for disagreement about whether ‘slow, gradual buildup of spiky capabilities profiles’ would change this. i don’t think it would because … if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not ‘new equilibriums, just with less balance’ for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there’s an offensive advantage, but we don’t actually know what the limit looks like. although it’s hard for me to conceive of a true defensive advantage in fundamental physics that can’t be used offensively by macroscopic beings. would be interested in seeing made up examples.
i’ll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.
you wrote:
i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.
(i’m guessing this is about how the ‘optimal action’ could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)
e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn’t matter who will reach level 100 first.
I think I understand your position better, and a crux for real-world decision making is that in practice, I don’t really think this assumption is correct by default, especially if there’s a transition period.
i do not understand your position from this, so you’re welcome to write more. also, i’m not sure if i added the paragraph about slow takeoff before or after you loaded the comment.
an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.
One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.
The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it’s easy for others to get their own superintelligences with different values, such that conflict doesn’t go away.
oh, well to clarify then, i was trying to say that i didn’t mean ‘superhuman’ at all, i directly meant optimal. i don’t believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.
Edited my comment.
but it still says “it’s easy for others to get their own superintelligences with different values”, with ‘superintelligence’ referring to the ‘superhuman’ AI of 2035?
my response is the same, the story ends before what i meant by superintelligence has occurred.
(it’s okay if this discussion was secretly a definition difference till now!)
Yeah, the crux is I don’t think the story ends before superintelligence, for a combination of reasons
what i meant by “the story ends before what i meant by superintelligence has occurred” is that the written one ends there in 2035, but at that point there’s still time to effect what the first long-term-decisive thing will be.
still confused about this btw. in my second reply to you i wrote:
and you did not say you were, but it looks like you are here?
I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.
That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn’t happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.
oh okay, i’ll have to reinterpret then. edit: i just tried, but i still don’t get it; if it’s “very strongly superhuman”, why is it merely “when the economy starts getting seriously disrupted”? (<- this feels like it’s back at where this thread started)
why?
I should probably edit that at some point, but I’m on my phone, so I’ll do it tomorrow.
A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it’s easier to get supplies to your area than it is to get supplies to an offensive unit.
This especially matters if physical goods need to be transported from one place to another place.
ah. for ‘at optimality’ which you wrote, i don’t imagine it to take place on that high of a macroscopic level (the one on which ‘supplies’ could be transported), i think the limit is more things that look to us like the category of ‘angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don’t yet know about, or to suddenly carve out a copy of itself there to start doing things locally’, and also i’m not imagining the competitors being ‘solid’ macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively ‘lower’ level of physics (which also do not need ‘supplies’). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you’re not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)
(i’m just exploring what it would be like to be clear, i don’t think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)
I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.
My new point is that I don’t think optimal agents will exist when we lose all control, but yes I didn’t realize an assumption was more load-bearing than I thought.
(btw I also realized I didn’t strictly mean ‘optimal’ by ‘superintelligent’, but at least close enough to it / ‘strongly superhuman enough’ for us to not be able to tell the difference. I originally used the ‘optimal’ wording trying to find some other definition apart from ‘super-human’)
it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can’t just directly do that instead), or something.
also, a ‘multipolar AI-driven but still normal-ish’ scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).
(maybe after ‘loss of control to weaker AI’ scenarios, those weaker AIs also keep making better agents afterwards, but i’m not sure about that, because they could be myopic and in some stable pattern/equilibrium)
(e.g. the ‘going out with a whimper’ part of this post)
i missed this part:
i’m not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it’s hard)
in particular, i’m not sure if you’re saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under ‘cosmopolitan’ values relative to if their values controlled the entire lightcone. example trivially true thing 2: “the best possible world is relative to a given value set”)
That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.
I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.
i don’t know where that might be true, but at least on lesswrong i imagine it’s an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.
also on the “lose out” phrasing: even if someone “wants at least some people to have tormentful lives”, they don’t “lose out” overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.
I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.
In particular, I’d expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.
I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don’t realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don’t think is true.
Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.
My guess is that it’s just an effect of field growth. A lot of people coming in now weren’t around when the consensus formed and don’t agree with it or don’t even know much about it.
Also, the consensus wasn’t exactly uncontroversial on LW even way back in the day. Hanson’s Ems inhabit a somewhat more recognisable world and economy that doesn’t have superintelligence in it, and lots of skeptics used to be skeptical in the sense of thinking all of this AI stuff was way too speculative and wouldn’t happen for hundreds of years if ever, so they made critiques of that form or just didn’t engage in AI discussions at all. LW wasn’t anywhere near this AI-centric when I started reading it around 2010.
My question specifically asks about the transition to ASI, which, while I think it’s really hard to predict, seems likely to take years, during which time we have intelligences just a bit above human level, before they’re truly world-changingly superintelligent. I understand this isn’t everyone’s model, and it’s not necessarily mine, but I think it is plausible.
Asking “how could someone ask such a dumb question?” is a great way to ensure they leave the community. (Maybe you think that’s a good thing?)
I don’t, sorry. (I’d encourage you not to leave just because of this, if it was just this. maybe LW mods can reactivate your account? @Habryka)
Yeah looks like I misinterpreted it. I agree that time period will be important.
I’ll try to be more careful.
Fwiw, I wasn’t expecting this shortform to get much engagement, but given it did it probably feels like public shaming, if I imagine what it’s like.
(Happy to reactivate your account, though I think you can also do it yourself)
I hope you’re okay btw
I’m fine. Don’t worry to much about this. It just made me think, what am I doing here? For someone to single out my question and say “it’s dumb to even ask such a thing” (and the community apparently agrees)… I just think I’ll be better off not spending time here.
I’d guess that most just skimmed what was visible from the hoverover, while under the impression it was what my text said. The engagement on your post itself is probably more representative.
Did not mean to do that.
I guess part of the issue is that in any discussion, people don’t use the same terms in the same way. Some people call present-day AI capabilities by terms like “superintelligent” in a specific domain. Which is not how I understand the term, but I understand where the idea to call it that comes from. But of course such mismatched definitions make discussions really hard. Seeing stuff like that makes it very understandable why Yudkowsky wrote the LW Sequences...
Anyway, here is an example of a recent shortform post which grapples with the same issue that vague terms are confusing.
I feel like this is a bit incorrect. There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy without immediately exploding into an utopian (or dystopian) singularity. The question is whether we are liable to build such things before we build the exploding singularity kind, or if the latter is in some sense easier to build and thus stumble upon first. Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.
sure, e.g. i think (<- i may be wrong about what the average human can do) that GPT-4 meets this definition (far superhuman at predicting author characteristics, above-average-human at most other abstract things). that’s a totally different meaning.
do you mean they believe superintelligence (the singularity-creating kind) is impossible, and so don’t also expect it to come after? it’s not sufficient for less capable AIs to defaultly come before superintelligence.
I think some believe it’s downright impossible and others that we’ll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we’ll make a sort of magical genie still bound by us expressing our wishes.