flandry39 comments on What if Alignment is Not Enough?

flandry39 Jan 30, 2025, 3:36 AM
2 points
1
> Our ASI would use its superhuman capabilities
> to prevent any other ASIs from being built.
This feels like a “just so” fairy tale.
No matter what objection is raised,
the magic white knight always saves the day.

> Also, the ASI can just decide
> to turn itself into a monolith.
No more subsystems?
So we are to try to imagine
a complex learning machine
without any parts/components?

> Your same SNC reasoning could just well
> be applied to humans too.
No, not really, insofar as the power being
assumed and presumed afforded to the ASI
is very very much greater than that assumed
applicable to any mere mortal human.
Especially and exactly because the nature of ASI
is inherently artificial and thus, in key ways,
inherently incompatible with organic human life.
It feels like you bypassed a key question:
Can the ASI prevent the relevant classes
of significant (critical) organic human harm,
that soon occur as a direct_result of its
own hyper powerful/consequential existence?
Its a bit like asking if an exploding nuclear bomb
detonating in the middle of some city somewhere,
could somehow use its hugely consequential power
to fully and wholly self contain, control, etc,
all of the energy effects of its own exploding,
simply because it “wants to” and is “aligned”.
Either you are willing to account for complexity,
and of the effects of the artificiality itself,
or you are not (and thus there would be no point
in our discussing it further, in relation to SNC).
The more powerful/complex you assume the ASI to be,
and thus also the more consequential it becomes,
the ever more powerful/complex you must also
(somehow) make/assume its control system to be,
and thus also of its predictive capability,
and also an increase of the deep consequences
of its mistakes (to the point of x-risk, etc).
What if maybe something unknown/unknowable
about its artificalness turns out to matter?
Why? Because exactly none of the interface
has ever even once been tried before—
there nothing for it to learn from, at all,
until after the x-risk has been tried,
and given the power/consequence, that is
very likely to be very much too late.
But the real issue is that rate of power increase,
and consequence, and potential for harm, etc,
of the control system itself (and its parts)
must increase at a rate that is greater than
the power/consequence of the base unaligned ASI.
That is the 1st issue, an inequality problem.
Moreover, there is an base absolute threshold
beyond which the notion of “control” is untenable,
just inherently in itself, given the complexity.
Hence, as you assume that the ASI is more powerful,
you very quickly make the cure worse than the disease,
and moreover than that, just even sooner cross into
the range of that which is inherently incurable.
The net effect, overall, as has been indicated,
is that an aligned ASI cannot actually prevent
important relevant unknown unknown classes
of significant (critical) organic human harm.
The ASI existence in itself is a net negative.
The longer the ASI exists, and the more power
that you assume that the ASI has, the worse.
And that all of this will for sure occur
as a direct_result of its existence.
Assuming it to be more powerful/consequential
does not help the outcome because that method
simply ignores the issues associated with the
inherent complexity and also its artificality.
The fairy tale white knight to save us is dead.
- WillPetillo Jan 30, 2025, 8:44 AM
  1 point
  0
  Parent
  I’d like to attempt a compact way to describe the core dilemma being expressed here.
  
  Consider the expression: y = x^a—x^b, where ‘y’ represents the impact of AI on the world (positive is good), ‘x’ represents the AI’s capability, ‘a’ represents the rate at which the power of the control system scales, and ‘b’ represents the rate at which the surface area of the system that needs to be controlled (for it to stay safe) scales.
  
  (Note that this is assuming somewhat ideal conditions, where we don’t have to worry about humans directing AI towards destructive ends via selfishness, carelessness, malice, etc.)
  
  If b > a, then as x increases, y gets increasingly negative. Indeed, y can only be positive when x is less than 1. But this represents a severe limitation on capabilities, enough to prevent it from doing anything significant enough to hold the world on track towards a safe future, such as preventing other AIs from being developed.
  
  There are two premises here, and thus two relevant lines of inquiry:
  1) b > a, meaning that complexity scales faster than control.
  2) When x < 1, AI can’t accomplish anything significant enough to avert disaster.
  
  Arguments and thought experiments where the AI builds powerful security systems can be categorized as challenges to premise 1; thought experiments where the AI limits its range of actions to prevent unwanted side effects—while simultaneously preventing destruction from other sources (including other AIs built)--are challenges to premise 2.
  
  Both of these premises seem like factual statements relating to how AI actually works. I am not sure what to look for in terms of proving them (I’ve seen some writing on this relating to control theory, but the logic was a bit too complex for me to follow at the time).
- Dakara Jan 30, 2025, 8:35 AM
  1 point
  0
  Parent
  Thanks for the response!
  So we are to try to imagine a complex learning machine without any parts/components?
  Yeah, sure. Humans are an example. If I decide to jump of the cliff, my arm isn’t going to say “alright, you jump but I stay here”. Either I, as a whole, would jump or I, as a whole, would not.
  Can the ASI prevent the relevant classes
  of significant (critical) organic human harm,
  that soon occur as a direct_result of its
  own hyper powerful/consequential existence?
  If by that, you mean “can ASI prevent some relevant classes of harm caused by its existence”, then the answer is yes.
  If by that you mean “can ASI prevent all relevant classes of harm caused by its existence”, then the answer is no, but almost nothing can, so the definition becomes trivial and uninteresting.
  However, ASI can prevent a bunch of other relevant classes of harm for humanity. And it might well be likely that the amount of harm it prevents across multiple relevant sources is going to be higher than the amount of harm it won’t prevent due to predictative limitations.
  This again runs into my guardian angel analogy. Guardian Angel also cannot prevent all relevant sources of harm caused by its existence. Perhaps there are pirates who hunt for guardian angels, hiding in the next galaxy. They might use special cloaks that hide themselves from the guardian angel’s radar. As soon as you accept guardian angel’s help, perhaps they would destroy the Earth in their pursuit.
  But similarly, the decision to reject guardian angel’s help doesn’t prevent all relevant classes of harm caused by itself. Perhaps there are guardian angel worshippers who are traveling as fast as they can to Earth to see their deity. But just before they arrive you reject guardian angel’s help and it disappears. Enraged at your decision, the worshippers destroy Earth.
  So as you can see, neither the decision to accept, nor the decision to reject guardian angel’s help can prevent all relevant classes of harm cause by itself.
  What if maybe something unknown/unknowable
  about its artificalness turns out to matter?
  Why? Because exactly none of the interface
  has ever even once been tried before
  Imagine that we create a vaccine from cancer (just imagine). Just before releasing it to public one person says “what if maybe something unknown/unknowable about its substance turns out to matter? What if we are all in a simulation and the injection of that particular substance would make it so that our simulators start torturing all of us. Why? Because exactly no times has this particular substance been injected.”
  I think we can agree that the researchers shouldn’t throw away the cancer vaccines, despite hearing this argument. It could be argued just as well that the simulators would torture us for throwing away the vaccine.
  Another example, let’s go back a couple hundred years ago to the pre-electricity time. Imagine a worried person coming to a scientist working on early electricity theory and saying “What if maybe something unknown/unknowable about its effects turns out to matter? Why? Because exactly none of this has ever even once been tried before.”
  This worried person could also have given an example of dangers of electricity by noticing how lightning kills people it touches.
  Should the scientist have stopped working on electricity therefore?
  - flandry39 Jan 30, 2025, 10:10 AM
    2 points
    5
    Parent
    > Humans do things in a monolithic way,
    > not as “assemblies of discrete parts”.
    Organic human brains have multiple aspects.
    Have you ever had more than one opinion?
    Have you ever been severely depressed?
    
    > If you are asking “can a powerful ASI prevent
    > /all/ relevant classes of harm (to the organic)
    > caused by its inherently artificial existence?”,
    > then I agree that the answer is probably “no”.
    > But then almost nothing can perfectly do that,
    > so therefore your question becomes
    > seemingly trivial and uninteresting.
    The level of x-risk harm and consequence
    potentially caused by even one single mistake
    of your angelic super-powerful enabled ASI
    is far from “trivial” and “uninteresting”.
    Even one single bad relevant mistake
    can be an x-risk when ultimate powers
    and ultimate consequences are involved.
    Either your ASI is actually powerful,
    or it is not; either way, be consistent.
    Unfortunately the ‘Argument by angel’
    only confuses the matter insofar as
    we do not know what angels are made of.
    ”Angels” are presumably not machines,
    but they are hardly animals either.
    But arguing that this “doesn’t matter”
    is a bit like arguing that ’type theory’
    is not important to computer science.
    The substrate aspect is actually important.
    You cannot simply just disregard and ignore
    that there is, implied somewhere, an interface
    between the organic ecosystem of humans, etc,
    and that of the artificial machine systems
    needed to support the existence of the ASI.
    The implications of that are far from trivial.
    That is what is explored by the SNC argument.
    
    > It might well be likely
    > that the amount of harm ASI prevents
    > (across multiple relevant sources)
    > is going to be higher/greater than
    > the amount of harm ASI will not prevent
    > (due to control/predicative limitations).
    It might seem so, by mistake or perhaps by
    accidental (or intentional) self deception,
    but this can only be a short term delusion.
    This has nothing to do with “ASI alignment”.
    Organic live is very very complex
    and in the total hyperspace of possibility,
    is only robust across a very narrow range.
    Your cancer vaccine is within that range;
    as it is made of the same kind of stuff
    as that which it is trying to cure.
    In the space of the kinds of elementals
    and energies inherent in ASI powers
    and of the necessary (side) effects
    and consequences of its mere existence,
    (as based on an inorganic substrate)
    we end up involuntarily exploring
    far far beyond the adaptive range
    of all manner of organic process.
    It is not just “maybe it will go bad”,
    but more like it is very very likely
    that it will go much worse than you
    can (could ever) even imagine is possible.
    Without a lot of very specific training,
    human brains/minds are not at all well equipped
    to deal with exponential processes, and powers,
    of any kind, and ASI is in that category.
    Organic live is very very fragile
    to the kinds of effects/outcomes
    that any powerful ASI must engender
    by its mere existence.
    If your vaccine was made of neutronium,
    then I would naturally expect some
    very serious problems and outcomes.
    - Dakara Jan 30, 2025, 10:48 AM
      1 point
      0
      Parent
      Organic human brains have multiple aspects. Have you ever had more than one opinion? Have you ever been severely depressed?
      Yes, but none of this would remain alive if I as a whole decide to jump from a cliff. My multiple aspects of my brain would die with my brain. After all, you mentioned subsystems that wouldn’t self terminate with the rest of the ASI. Whereas in human body, jumping from a cliff terminates everything.
      But even barring that, ASI can decide to fly into the Sun and any subsystem that shows any sign of refusal to do so will be immediately replaced/impaired/terminated. In fact, it would’ve been terminated a long time ago by “monitors” which I described before.
      The level of x-risk harm and consequence
      potentially caused by even one single mistake
      of your angelic super-powerful enabled ASI
      is far from “trivial” and “uninteresting”.
      Even one single bad relevant mistake
      can be an x-risk when ultimate powers
      and ultimate consequences are involved.
      It is trivial and uninteresting in a sense that there is a set of all things that we can build (set A). There is also a set of all things that can prevent all relevant classes of harm caused by its existence (set B). If these sets don’t overlap, then saying that a specific member of set A isn’t included in set B is indeed trivial, because we already know this via a more general reasoning (that these sets don’t overlap).
      Unfortunately the ‘Argument by angel’
      only confuses the matter insofar as
      we do not know what angels are made of.
      “Angels” are presumably not machines,
      but they are hardly animals either.
      But arguing that this “doesn’t matter”
      is a bit like arguing that ‘type theory’
      is not important to computer science.
      The substrate aspect is actually important.
      You cannot simply just disregard and ignore
      that there is, implied somewhere, an interface
      between the organic ecosystem of humans, etc,
      and that of the artificial machine systems
      needed to support the existence of the ASI.
      But I am not saying that it doesn’t matter. On contrary, I made my analogy in such a way that the helper (namely our guardian angel) is a being that is commonly thought to be made up of a different substrate. In fact, in this example, you aren’t even sure what it is made of, beyond knowing that it’s clearly a different substrate. You don’t even know how that material interacts with physical world. That’s even less than what we know about ASIs and their material.
      And yet, getting a personal, powerful, intelligent guardian angel that would act in your best interests for as long as it can (its a guardian angel after all) seems like obviously a good thing.
      But if you disagree with what I wrote above, let the takeway be at least that you are worried about case (2) and not case (1). After all, knowing that there might be pirates hunting for this angel (that couldn’t be detected by said angel) didn’t make you immediately decline the proposal. You started talking about substrate which fits with the concerns of someone who is worried about case (2).
      Your cancer vaccine is within that range;
      as it is made of the same kind of stuff
      as that which it is trying to cure.
      We can make the hypothetical more interesting. Let’s say that this vaccine is not created from organic stuff, but that it has passed all the tests with flying colors. Let’s also assume that this vaccine has been in testing for 150 years and that it has shown absolutely no side effects during the entire human life (let’s say that it was being injected in 2 year old people and it has shown no side effects at all, even in 90 year old people, who has lived with this vaccine their entire lives). Let’s also assume that it has been tested to not have any side effects on children and grandchildren of those who took said vaccine. Would you be campaigning for throwing away such a vaccine, just because it is based on a different substrate?
      - flandry39 Jan 31, 2025, 12:21 AM
        2 points
        1
        Parent
        The only general remarks that I want to make
        are in regards to your question about
        the model of 150 year long vaccine testing
        on/over some sort of sample group and control group.
        I notice that there is nothing exponential assumed
        about this test object, and so therefore, at most,
        the effects are probably multiplicative, if not linear.
        Therefore, there are lots of questions about power dynamics
        that we can overall safely ignore, as a simplification,
        which is in marked contrast to anything involving ASI.
        If we assume, as you requested, “no side effects” observed,
        in any test group, for any of those things
        that we happened to be thinking of, to even look for,
        then for any linear system, that is probably “good enough”.
        But for something that is know for sure to be exponential,
        that by itself is not anywhere enough to feel safe.
        But what does this really mean?
        Since the common and prevailing (world) business culture
        is all about maximal profit, and therefore minimal cost,
        and also to minimize any possible future responsibility
        (or cost) in case anything with the vax goes badly/wrong,
        then for anything that might be in the possible category
        of unknown unknown risk, I would expect that company
        to want to maintain sort of some plausible deniability—
        ie; to not look so hard for never-before-seen effects.
        Or to otherwise ignore that they exist, or matter, etc.
        (just like throughout a lot of ASI risk dialogue).
        If there is some long future problem that crops up,
        the company can say “we never looked for that”
        and “we are not responsible for the unexpected”,
        because the people who made the deployment choices
        have taken their profits and their pleasure in life,
        and are now long dead. “Not my Job”.
        “Don’t blame us for the sins of our forefathers”.
        Similarly, no one is going to ever admit or concede
        any point, of any argument, on pain of ego death.
        No one will check if it is an exponential system.
        So of course, no one is going to want to look into
        any sort of issues distinguishing the target effects,
        from the also occurring changes in world equilibrium.
        They will publish their glowing sanitized safety report,
        deploy the product anyway, regardless, and make money.
        “Pollution in the world is a public commons problem”—
        so no corporation is held responsible for world states.
        It has become “fashionable” to ignore long term evolution,
        and to also ignore and deny everything about the ethics.
        But this does not make the issue of ASI x-risk go away.
        X-risks are the generally result of exponential process,
        and so the vaccine example is not really that meaningful.
        With the presumed ASI levels of actually exponential power,
        this is not so much about something like pollution,
        as it is about maybe igniting the world atmosphere,
        via a mistake in the calculations of the Trinity Test.
        Or are you going to deny that Castle Bravo is a thing?
        Beyond this one point, my feeling is that your notions
        have become a bit too fanciful for me to want respond
        too seriously. You can, of course, feel free to
        continue to assume and presume whatever you want,
        and therefore reach whatever conclusions you want.
        
        Dakara Jan 31, 2025, 10:05 AM
        1 point
        0
        Parent
        Thanks for the reply!
        The only general remarks that I want to make
        are in regards to your question about
        the model of 150 year long vaccine testing
        on/over some sort of sample group and control group.
        I notice that there is nothing exponential assumed
        about this test object, and so therefore, at most,
        the effects are probably multiplicative, if not linear.
        Therefore, there are lots of questions about power dynamics
        that we can overall safely ignore, as a simplification,
        which is in marked contrast to anything involving ASI.
        If we assume, as you requested, “no side effects” observed,
        in any test group, for any of those things
        that we happened to be thinking of, to even look for,
        then for any linear system, that is probably “good enough”.
        I am not sure I understand the distinction between linear and exponential in the vaccine context. By linear do you mean that only few people die? By exponential do you mean that a lot of people die?
        If so, then I am not so sure that vaccine effects could only be linear. For example, there might be some change in our complex environment that would prompt the vaccine to act differently than it did in the past.
        More generally, our vaccine can lead to catastrophic outcomes if there is something about its future behavior that we didn’t predict. And if that turns out to be true, then things could go ugly really fast.
        And the extent of the damage can be truly big. “Scientifically proven” cancer vaccine that passed the tests is like the holy grail of medicine. “Curing cancer” is often used by parents as an example of the great things their children could achieve. This is combined with the fact that cancer has been with us for a long time and the fact that the current treatment is very expensive and painful.
        All of these factors combined tell us that in a relatively short period of time a large percentage of the total population will get this vaccine. At that point, the amount of damage that can be done only depends on what thing we overlooked, which we, by definition, have no control over.
        If there is some long future problem that crops up,
        the company can say “we never looked for that”
        and “we are not responsible for the unexpected”,
        because the people who made the deployment choices
        have taken their profits and their pleasure in life,
        and are now long dead. “Not my Job”.
        “Don’t blame us for the sins of our forefathers”.
        Similarly, no one is going to ever admit or concede
        any point, of any argument, on pain of ego death.
        This same excuse would surely be used by companies manufacturing the vaccine. They would argue that they shouldn’t be blamed for something that the researchers overlooked. They would say that they merely manufactured the product in order to prevent the needless suffering of countless people.
        For all we know, by the time that the overlooked thing happens, the original researchers (who developed and tested the vaccine) are long dead, having lived a life of praise and glory for their ingenious invention (not to mention all the money that they received).