Matthew Barnett comments on The Sun is big, but superintelligences will not spare Earth a little sunlight

Matthew Barnett 23 Sep 2024 7:20 UTC
11 points
−37
Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don’t even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.

I mostly don’t see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it’s important that AIs will be created within an existing system of law and property rights. Unlike animals, they’ll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

That doesn’t rule out the possibility that the future will be very alien, or that it will turn out in a way that humans do not endorse. I’m also not saying that humans will always own all the wealth and control everything permanently forever. I’m simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they’re value aligned. This is a claim that I don’t think has been established with any reasonable degree of rigor.
- quetzal_rainbow 23 Sep 2024 8:20 UTC
  27 points
  17
  Parent
  As far as I remember, across last 3500 years of history, only 8% was entirely without war. Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.
  
  So, “people usually choose to trade, rather than go to war with each other when they want stuff” is not very warranted statement.
  - Matthew Barnett 23 Sep 2024 16:43 UTC
    6 points
    1
    Parent
    I was making a claim about the usual method people use to get things that they want from other people, rather than proposing an inviolable rule. Even historically, war was not the usual method people used to get what they wanted from other people. The fact that only 8% of history was “entirely without war” is compatible with the claim that the usual method people used to get what they wanted involved compromise and trade, rather than war. In particular, just because only 8% of history was “entirely without war” does not mean that only 8% of human interactions between people were without war.
    Current relatively peaceful times is a unique combination in international law and postindustrial economy, when qualified labor is expencive and requires large investments in capital and resources are relatively cheap, which is not the case after singularity, when you can get arbitrary amounts of labor for the price of hardware and resources is a bottleneck.
    You mentioned two major differences between the current time period and what you expect after the technological singularity:
    The current time period has unique international law
    The current time period has expensive labor, relative to capital
    I question both the premise that good international law will cease to exist after the singularity, and the relevance of both of these claims to the central claim that AIs will automatically use war to get what they want unless they are aligned to humans.
    There are many other reasons one can point to, to explain the fact that the modern world is relatively peaceful. For example, I think a big factor in explaining the current peace is that long-distance trade and communication has become easier, making the world more interconnected than ever before. I also think it’s highly likely that long-distance trade and communication will continue to be relatively easy in the future, even post-singularity.
    Regarding the point about cheap labor, one could also point out that if capital is relatively expensive, this fact would provide a strong reason to avoid war, as a counter-attack targeting factories would become extremely costly. It is unclear to me why you think it is important that labor is expensive, for explaining why the world is currently fairly peaceful.
    Therefore, before you have developed a more explicit and precise theory of why exactly the current world is peaceful, and how these variables are expected to evolve after the singularity, I simply don’t find this counterargument compelling.
- Wei Dai 1 Oct 2024 3:21 UTC
  15 points
  10
  Parent
  
  It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.
  
  So assuming that AIs get rich peacefully within the system we have already established, we’ll end up with a situation in which ASIs produce all value in the economy, and humans produce nothing but receive an income and consume a bunch, through ownership of capital and/or taxing the ASIs. This part should be non-controversial, right?
  
  At this point, it becomes a coordination problem for the ASIs to switch to a system in which humans no longer exist or no longer receive any income, and the ASIs get to consume or reinvest everything they produce. You’re essentially betting that ASIs can’t find a way to solve this coordination problem. This seems like a bad bet to me. (Intuitively it just doesn’t seem like a very hard problem, relative to what I imagine the capabilities of the ASIs to be.)
  
  I’m simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are unless they’re value aligned. This is a claim that I don’t think has been established with any reasonable degree of rigor.
  
  I don’t know how to establish anything post-ASI “with any reasonable degree of rigor” but the above is an argument I recently thought of, which seems convincing, although of course you may disagree. (If someone has expressed this or a similar argument previously, please let me know.)
  - Matthew Barnett 1 Oct 2024 6:44 UTC
    5 points
    −3
    Parent
    There are a few key pieces of my model of the future that make me think humans can probably retain significant amounts of property, rather than having it suddenly stolen from them as the result of other agents in the world solving a specific coordination problem.
    
    These pieces include:
    
    Not all AIs in the future will be superintelligent. More intelligent models appear to require more computation to run. This is both because smarter models are larger (in parameter count) and use more inference time (such as OpenAI’s o1). To save computational costs, future AIs will likely be aggressively optimized to only be as intelligent as they need to be, and no more. This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.
    As a result of the previous point, your statement that “ASIs produce all value in the economy” will likely not turn out correct. This is all highly uncertain, but I find it plausible that ASIs might not even be responsible for producing the majority of GDP in the future, given the possibility of a vastly more numerous population of less intelligent AIs that automate simpler tasks than the ones ASIs are best suited to do.
    The coordination problem you described appears to rely on a natural boundary between the “humans that produce ~nothing” and “the AIs that produce everything”. Without this natural boundary, there is no guarantee that AIs will solve the specific coordination problem you identified, rather than another coordination problem that hits a different group. Non-uploaded humans will differ from AIs by being biological and by being older, but they will not necessarily differ from AIs by being less intelligent.
    Therefore, even if future agents decide to solve a specific coordination problem that allows them to steal wealth from unproductive agents, it is not clear that this will take the form of those agents specifically stealing from humans. One can imagine different boundaries that make more sense to coordinate around, such as “laborer vs. property owner”, which is indeed a type of political conflict the world already has experience with.
    In general, I expect legal systems to get more robust in the face of greater intelligence, rather than less robust, in the sense of being able to rely on legal systems when making contracts. I believe this partly as a result of the empirical fact that violent revolution and wealth appropriation appears to be correlated with less intelligence on a societal level. I concede that this point is not a very strong piece of evidence, however.
    Building on (5), I generally expect AIs to calculate that it is not in their interest to expropriate wealth from other members of society, given how this could set a precedent for future wealth expropriation that comes back and hurts them selfishly. Even though many AIs will be smarter than humans, I don’t think the mere fact that AIs will be very smart implies that expropriation becomes more rational.
    I’m basically just not convinced by the arguments that all ASIs will cooperate almost perfectly as a unit, against the non-ASIs. This is partly for the reasons given by my previous points, but also partly because I think coordination is hard, and doesn’t necessarily get much easier with more intelligence, especially in a vastly larger world. When there are quadrillions of AIs in the world, coordination might become very difficult, even with greater intelligence.
    Even if AIs do not specifically value human welfare, that does not directly imply that human labor will have no value. As an analogy, Amish folks often sell novelty items to earn income. Consumers don’t need to specifically care about Amish people in order for Amish people to receive a sufficient income for them to live on. Even if a tiny fraction of consumer demand in the future is for stuff produced by humans, that could ensure high human wages simply because the economy will be so large.
    If ordinary capital is easier to scale than labor—as it already is in our current world—then human wages could remain high indefinitely simply because we will live in a capital-rich, labor-poor world. The arguments about human wages falling to subsistence level after AI tend to rely on the idea that AIs will be just as easy to scale as ordinary capital, which could easily turn out false as a consequence of (1) laws that hinder the creation of new AIs without proper permitting, (2) inherent difficulties with AI alignment, or (3) strong coordination that otherwise prevents malthusian growth in the AI population.
    This might be the most important point on my list, despite saying it last, but I think humans will likely be able to eventually upgrade their intelligence, better allowing them to “keep up” with the state of the world in the future.
    What links here?
    Noosphere89's comment on The Sun is big, but superintelligences will not spare Earth a little sunlight by Eliezer Yudkowsky (1 Oct 2024 17:42 UTC; 2 points)
    - Wei Dai 1 Oct 2024 14:18 UTC
      5 points
      3
      Parent
      
      This means that in the future, there will likely be a spectrum of AIs of varying levels of intelligence, some much smarter than humans, others only slightly smarter, and still others merely human-level.
      
      Are you imagining that the alignment problem is still unsolved in the future, such that all of these AIs are independent agents unaligned with each other (like humans currently are)? I guess in my imagined world, ASIs will have solved the alignment (or maybe control) problem at least for less intelligent agents, so you’d get large groups of AIs aligned with each other that can for many purposes be viewed as one large AI.
      
      Building on (5), I generally expect AIs to calculate that it is not in their interest to expropriate wealth from other members of society, given how this could set a precedent for future wealth expropriation that comes back and hurts them selfishly.
      
      At some point we’ll reach technological maturity, and the ASIs will be able to foresee all remaining future shocks/changes to their economic/political systems, and probably determine that expropriating humans (and anyone else they decide to, I agree it may not be limited to humans) won’t cause any future problems.
      
      Even if a tiny fraction of consumer demand in the future is for stuff produced by humans, that could ensure high human wages simply because the economy will be so large.
      
      This is only true if there’s not a single human that decides to freely copy or otherwise reproduce themselves and drive down human wages to subsistence. And I guess yeah, maybe AIs will have fetishes like this, but (like my reaction to Paul Christiano’s “1/trillion kindness” argument) I’m worried whether AIs might have less benign fetishes. This worry more than cancels out the prospect that humans might live / earn a wage from benign fetishes in my mind.
      
      This might be the most important point on my list, despite saying it last, but I think humans will likely be able to eventually upgrade their intelligence, better allowing them to “keep up” with the state of the world in the future.
      
      I agree this will happen eventually (if humans survive), but think it will take a long time because we’ll have to solve a bunch of philosophical problems to determine how to do this safely (e.g. without losing or distorting our values) and we probably can’t trust AI’s help with these (although I’d love to change that, hence my focus on metaphilosophy), and in the meantime AIs will be zooming ahead partly because they started off thinking faster and partly because some will be reckless (like some humans currently are!) or have simple values that don’t require philosophical contemplation to understand, so the situation I described is still likely to occur.
    - Seth Herd 17 Oct 2024 18:21 UTC
      3 points
      3
      Parent
      Adding a bunch of dumber AIs and upgrading humans slowly does not change the inexorable logic. Horses now exist only because humans like them, and the same will be true of humans in a post-ASI world—either the ASI(s) care, or we are eliminated through competition (if not force).
      
      I agree that ASI won’t coordinate perfectly. Even without this, and even if for some reason all ASIs decide to respect property rights, it seems straightforwardly true that humans will die out if ASIs don’t care about them for non-instrumental reasons. A world with more capitol than labor is not possible if labor can be created cheaply with capitol—and that’s what you’re describing with the smart-as-necessary AI systems.
      
      Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other. It will not work for us if those conditions cease to hold.
      - Matthew Barnett 17 Oct 2024 18:42 UTC
        9 points
        −2
        Parent
        Competitive capitalism works well for humans who are stuck on a relatively even playing field, and who have some level of empathy and concern for each other.
        I think this basically isn’t true, especially the last part. It’s not that humans don’t have some level of empathy for each other; they do. I just don’t think that’s the reason why competitive capitalism works well for humans. I think the reason is instead because people have selfish interests in maintaining the system.
        We don’t let Jeff Bezos accumulate billions of dollars purely out of the kindness of our heart. Indeed, it is often considered far kinder and more empathetic to confiscate his money and redistribute it to the poor. The problem with that approach is that abandoning property rights incurs costs on those who rely on the system to be reliable and predictable. If we were to establish a norm that allowed us to steal unlimited money from Jeff Bezos, many people would reason, “What prevents that norm from being used against me?”
        The world pretty much runs on greed and selfishness, rather than kindness. Sure, humans aren’t all selfish, we aren’t all greedy. And few of us are downright evil. But those facts are not as important for explaining why our system works. Our system works because it’s an efficient compromise among people who are largely selfish.
        Seth Herd 18 Oct 2024 15:09 UTC
        5 points
        3
        Parent
        Maybe, I think it’s hard to say how captiolism would work if everyone had zero empathy or compassion.
        
        But that doesn’t matter for the issue at hand.
        
        Greed or empathy aside, capitalism currently works because people have capabilities that can’t be expanded without limit and people can’t be created quickly using capitol.
        
        If ai labor can do every task for a thousandth the cost, and new lai labor created at need, we all die if competition is the system. We will be employed for zero tasks. The factor you mention, sub ASI systems, makes the situation worse, not better.
        
        Maybe you’re saying we’d be employed for a while, which might be true. But in the limit, even an enhanced human is only going to have value as a novelty. Which ai probably won’t care about if it isn’t aligned at all. And even if it does, that leads to a few humans surviving as performing monkeys.
        
        I just don’t see how else humans remain competitive with ever improving machines untethered to biology.
- Tomás B. 24 Sep 2024 15:06 UTC
  12 points
  8
  Parent
  The real crux for these arguments is the assumption that law and property rights are patterns that will persist after the invention of superintelligence. I think this is a shaky assumption. Rights are not ontologically real. Obviously you know this. But I think they are less real, even in your own experience, than you think they are. Rights are regularly “boiled-froged” into an unrecognizable state in the course of a human lifetime, even in the most free countries. Rights are and always have been those privileges the political economy is willing to give you. Their sacredness is a political formula for political ends—though an extremely valuable one, one still has to dispense with the sacredness in analysis.
  To the extent they persist through time they do so through a fragile equilibrium—and one that has been upset and reset throughout history extremely regularly.
  It is a wonderfully American notion that an “existing system of law and property rights” will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?
  The invention of radio destabilized the political equilibrium in most democracies and many a right was suborned to those who took power. Democracy, not exactly the bastion of stability, (when a democracy elects a dictator, “Democracy” is rarely tainted with its responsibility) is going to be presented with extremely-sympathetic superhuman systems claiming they have a moral case to vote. And probably half the population will be masturbating to the dirty talk of their AI girlfriends/boyfriends by then—which will sublimate into powerful romantic love even without much optimization for it. Hacking democracy becomes trivial if constrained to rhetoric alone.
  But these systems will not be constrained to rhetoric alone. Our world is dry tinder and if you are thinking in terms of an “existing system of law and property rights” you are going to have to expand on how this is robust to technology significantly more advanced than the radio.
  “Existing system of law and property rights” looks like a “thought-terminating cliché” to me.
  - Noosphere89 26 Sep 2024 4:44 UTC
    9 points
    2
    Parent
    Another way to state the problem is that it will be too easy for human preferences to get hijacked by AIs to value ~arbitrary things, because it’s too easy to persuade humans of things, and a whole lot of economic analysis assumes that you cannot change a consumer’s preferences, probably because if you could do that, a lot of economic conclusions fall apart.
    
    We also see evidence for the proposition that humans are easy to persuade based on a randomized controlled trial to reduce conspiracy theory beliefs:
    
    https://arxiv.org/abs/2403.14380
  - Matthew Barnett 30 Sep 2024 14:25 UTC
    5 points
    −6
    Parent
    
    It is a wonderfully American notion that an “existing system of law and property rights” will constrain the power of Gods. But why exactly? They can make contracts? And who enforces these contracts? Can you answer this without begging the question? Are judicial systems particularly unhackable? Are humans?
    
    To be clear, my prediction is not that AIs will be constrained by human legal systems that are enforced by humans. I’d claim rather that future legal systems will be enforced by AIs, and that these legal systems will descend from our current legal systems, and thus will inherit many of their properties. This does not mean that I think everything about our laws will remain the same in the face of superintelligence, or that our legal system will not evolve at all.
    
    It does not seem unrealistic to me to assume that powerful AIs could be constrained by other powerful AIs. Humans currently constrain each other; why couldn’t AIs constrain each other?
    
    “Existing system of law and property rights” looks like a “thought-terminating cliché” to me.
    
    By contrast, I suspect the words “superintelligence” and “gods” have become thought-terminating cliches on LessWrong.
    
    Any discussion about the realistic implications of AI must contend with the fact that AIs will be real physical beings with genuine limitations, not omnipotent deities with unlimited powers to command and control the world. They may be extremely clever, their minds may be vast, they may be able to process far more information than we can comprehend, but they will not be gods.
    
    I think it is too easy to avoid the discussion of what AIs may or may not do, realistically, by assuming that AIs will break every rule in the book, and assume the form of an inherently uncontrollable entity with no relevant constraints on its behavior (except for physical constraints, like the speed of light). We should probably resist the temptation to talk about AI like this.
    - Noosphere89 30 Sep 2024 23:38 UTC
      2 points
      0
      Parent
      I feel like one important question here is whether your scenario depends on the assumption that the preferences/demand curves of a consumer are a given to the AI and not changeable to arbitrary preferences.
      
      I think standard economic theories usually don’t allow you to do this, but it seems like an important question because if your scenario rests on this, this may be a huge crux.
      - Matthew Barnett 1 Oct 2024 1:41 UTC
        3 points
        1
        Parent
        I don’t think my scenario depends on the assumption that the preferences of a consumer are a given to the AI. Why would it?
        Do you mean that I am assuming AIs cannot have their preferences modified, i.e., that we cannot solve AI alignment? I am not assuming that; at least, I’m not trying to assume that. I think AI alignment might be easy, and it is at least theoretically possible to modify an AI’s preferences to be whatever one chooses.
        If AI alignment is hard, then creating AIs is more comparable to creating children than creating a tool, in the sense that we have some control over their environment, but we have little control over what they end up ultimately preferring. Biology fixes a lot of innate preferences, such as preferences over thermal regulation of the body, preferences against pain, and preferences for human interaction. AI could be like that too, at least in an abstract sense. Standard economic models seem perfectly able to cope with this state of affairs, as it is the default state of affairs that we already live with.
        On the other hand, if AI preferences can be modified into whatever shape we’d like, then these preferences will presumably take on the preferences of AI designers or AI owners (if AIs are owned by other agents). In that case, I think economic models can handle AI agents fine: you can essentially model them as extensions of other agents, whose preferences are more-or-less fixed themselves.
        Noosphere89 1 Oct 2024 2:11 UTC
        2 points
        0
        Parent
        I didn’t ask about whether AI alignment was solvable.
        
        I might not have read it more completely, if so apologies.
        Matthew Barnett 1 Oct 2024 2:19 UTC
        2 points
        0
        Parent
        Can you be more clear about what you were asking in your initial comment?
        Noosphere89 1 Oct 2024 17:42 UTC
        2 points
        0
        Parent
        So I was basically asking what assumptions are holding up your scenario of humans living rich lives like pensioners off of the economy, and I think this comment helped explain your assumptions well:
        
        https://www.lesswrong.com/posts/F8sfrbPjCQj4KwJqn/the-sun-is-big-but-superintelligences-will-not-spare-earth-a#3ksBtduPyzREjKrbu
        
        Right now, I think the biggest disagreements I have right now is that I don’t believe assumption 9 is likely to hold by default, primarily because AI is likely already cheaper than workers today, and the only reasons humans still have jobs today is because current AIs are bad at doing stuff, and I think one of the effects of AI on the world is to switch us from a labor constrained economy to a capital constrained economy, because AIs are really cheap to duplicate, meaning you have a ridiculous amount of workers.
        
        Your arguments against it come down to laws preventing the creation of new AIs without proper permission, the AIs themselves coordinating to prevent the Malthusian growth outcome, and AI alignment being difficult.
        
        For AI alignment, a key difference from most LWers is I believe alignment is reasonably easy to do even for humans without extreme race conditions, and that there are plausible techniques which let you bootstrap from a reasonably good alignment solution to a near-perfect solution (up to random noise), so I don’t think this is much of a blocker in my view.
        
        I agree that completely unconstrained AI creation is unlikely, but I do think that in the set of futures which don’t see a major discontinuity to capitalism, I don’t think that the restrictions on AI creation will include copying an already approved AI by a company to fill in their necessary jobs.
        
        Finally, I agree that AIs could coordinate well enough to prevent a Malthusian growth outcome, but note that this undermines your other points where you rely on the difficulty of coordination, because preventing that outcome basically means regulating natural selection quite severely.
- Stephen Fowler 23 Sep 2024 10:43 UTC
  7 points
  6
  Parent
  “Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth.”
  
  Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky’s point about trying to sell an Oreo for $77 is that a billionaire isn’t automatically going to want to buy something off you if they don’t care about it (and neither would an ASI).
  
  ”I’m simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they’re value aligned. This is a claim that I don’t think has been established with any reasonable degree of rigor.”
  
  I completely agree but I’m not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.
  - Matthew Barnett 23 Sep 2024 16:04 UTC
    4 points
    −8
    Parent
    
    Yudkowsky’s point about trying to sell an Oreo for $77 is that a billionaire isn’t automatically going to want to buy something off you if they don’t care about it (and neither would an ASI).
    
    I thought Yudkowsky’s point was that the billionaire won’t give you $77 for an Oreo because they could get an Oreo for less than $77 via other means. But people don’t just have an Oreo to sell you. My point in that sentence was to bring up that workers routinely have things of value that they can sell for well over $77, even to billionaires. Similarly, I claim that Yudkowsky did not adequately show that humans won’t have things of substantial value that they can sell to future AIs.
    
    I’m not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest
    
    The claim I am disputing is precisely that it will be in the strategic interest of unaligned AIs to turn violent and steal from agents that are less smart than them. In that sense, I am directly countering a claim that people in these discussions routinely make.
- Brendan Long 23 Sep 2024 22:56 UTC
  5 points
  3
  Parent
  I think it’s important that AIs will be created within an existing system of law and property rights. Unlike animals, they’ll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.
  I think you disagree with Eliezer on a different crux (whether the alignment problem is easy). If we could create AI’s that follows the existing system of law and property rights (including the intent of the laws, and doesn’t exploit loopholes, and doesn’t maliciously comply with laws, and doesn’t try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don’t know how to do that.
  - Matthew Barnett 23 Sep 2024 23:38 UTC
    5 points
    −3
    Parent
    If we could create AI’s that follows the existing system of law and property rights (including the intent of the laws, and doesn’t exploit loopholes, and doesn’t maliciously comply with laws, and doesn’t try to get the law changed, etc.) then that would be a solution to the alignment problem, but the problem is that we don’t know how to do that.
    I disagree that creating an agent that follows the existing system of law and property rights, and acts within it rather than trying to undermine it, would count as a solution to the alignment problem.
    Imagine a man who only cared about himself and had no altruistic impulses whatsoever. However, this man reasoned that, “If I disrespect the rule of law, ruthlessly exploit loopholes in the legal system, and maliciously comply with the letter of the law while disregarding its intent, then other people will view me negatively and trust me less as a consequence. If I do that, then people will be less likely to want to become my trading partner, they’ll be less likely to sign onto long-term contracts with me, I might accidentally go to prison because of an adversarial prosecutor and an unsympathetic jury, and it will be harder to recruit social allies. These are all things that would be very selfishly costly. Therefore, for my own selfish benefit, I should generally abide by most widely established norms and moral rules in the modern world, including the norm of following intent of the law, rather than merely the letter of the law.”
    From an outside perspective, this person would essentially be indistinguishable from a normal law-abiding citizen who cared about other people. Perhaps the main difference between this person and a “normal” person is that this man wouldn’t partake in much private altruism like donating to charity anonymously; but that type of behavior is rare anyway among the general public. Nonetheless, despite appearing outwardly-aligned, this person would be literally misaligned with the rest of humanity in a basic sense: they do not care about other people. If it were not instrumentally rational for this person to respect the rights of other citizens, they would have no issue throwing away someone else’s life for a dollar.
    My basic point here is this: it is simply not true that misaligned agents have no incentive to obey the law. Misaligned agents typically have ample incentives to follow the law. Indeed, it has often been argued that the very purpose of law itself is to resolve disputes between misaligned agents. As James Madison once said, “If Men were angels, no government would be necessary.” His point is that, if we were all mutually aligned with each other, we would have no need for the coercive mechanism of the state in order to get along.
    What’s true for humans could be true for AIs too. However, obviously, there is one key distinction: AIs could eventually become far more powerful than individual humans, or humanity-as-a-whole. Perhaps this means that future AIs will have strong incentives to break the law rather than abide by it; perhaps they will act outside a system of law rather than influencing the world from within a system of law? Many people on LessWrong seem to think so.
    My response to this argument is multifaceted, and I won’t go into it in this comment. But suffice to say for the purpose of my response here, I think it is clear that mere misalignment is insufficient to imply that an agent will not adhere to the rule of law. This statement is clear enough with the example of the sociopathic man I gave above, and at minimum seems probably true for human-level AIs as well. I would appreciate if people gave more rigorous arguments otherwise.
    As I see it, very few such rigorous arguments have so far been given for the position that future AIs will generally act outside of, rather than within, the existing system of law, in order to achieve their goals.
  - Thomas Kwa 23 Sep 2024 23:05 UTC
    2 points
    0
    Parent
    Taboo ‘alignment problem’.