Donald Hobson comments on Hinges and crises

Donald Hobson Mar 29, 2022, 3:26 PM
4 points
Here is one model of AI.
1. someone in a lab goes eurika. And starts coding. No one really knows if this will be the next alpha go or the singularity. No one that isn’t an expert on the topic sees anything beyond an expert claiming to have had a good idea. (If they make this info public).
2. The researches write code. Nothing obvious happening. Code is being produced. Unit tests pass.
3. Code now running. Performance number going up. World as a whole still looks basically the same, just with one computer cluster running a really special algorithm.
4. ASI creates nanotech. This involves something happening at a biolab, nothing the humans consider particularly noteworthy. A DNA printer runs for longer than normal. Its in an empty lab and no one notices. A student gets a plausible looking email claiming to be from their professor, and mixes a few chemicals accordingly.
5. Nanotech self replicates, leaving no obvious evidence of its existence for about a day.
6. everyone drops dead / sudden utopia, depending on alignment.
Can you give an example hinge with
- the crisis unfolds over a similar timescale (weeks or years, rather than seconds or hours),
- governments have some role,
- the risk is at least partially visible,
- the general population is engaged in some way.
A Hansonian slow takeoff with emulated minds would fit this fairly well. I consider Hanson to have lost the Foom debate,
- Jan_Kulveit Apr 5, 2022, 10:37 AM
  2 points
  Parent
  Yep, I know and understand the model you describe. Let’s call it “AI in a box explodes”. I give it less weight than some other people.
  
  Other models are basically everything else. Some specific examples:
  
  1. A gradually increasing proportion of corporate decision-making is being automated, using systems that are initially slightly better than the current way of managing corporations, but not in a way that gives any one player a decisive strategic advantage. Everything gets faster and faster, but in a continuous way. In this trajectory, geopolitics changes a lot along the way.
  
  The same in more abstract: existing superagents power grows, they are less constrained by running on human brains or having human owners.
  
  Various possible x-risk attractor states here are e.g.
  - “ascended economy”-like
  - “consequentialist superintelligence in a box” gets constructed later anyway, and explodes, but note that before this, there was a hingy period where both geopolitics and resources available to alignment research looked very different than today
  
  II. Narrow “STEM” AI systems cause progress on powerful technologies (e.g. fusion or nanotech). This has clearly visible results, and leads to regulation.
  
  III. Narrow “persuasion/memetics in silica” systems destabilize politics/ social epistemics / … with large consequences (e.g. triggering great power war).
  
  IV. Narrow “cybersec” AI system causes a major disaster, world reacts.
  
  General classes of scenarios are
  - most of continuous takeoff + states have roughly as large share of power as today (which is more than typical libertarian-leaning LW audience thinks)
  - most of scenarios with moderately sized non-x-risk AI-mediated catastrophe
  - CAIS-like worlds
  
  Robin Hanson’s ems seemed always implausible as the first way to AGI. At least for me, the basic argument against was always “by the time we know how to run ems, we will have learned enough design tricks from evolution to build non-em AGI”. The debate certainly isn’t the best set of arguments for continuity.
  
  Also, going back to the debate, it’s worth noting so far, positive feedback loops around AI route mostly through larger economy, and not via AIs editing it’s source code. (Eliezer would argue that this is still likely to happen later.)
  
  Also, it seems progress in most powerful ML models in past few years usually haven’t looked like someone having a heureka moment, coding in their garage, and surprising results happening. Largest results looked like labs spending millions of dollars on compute, and the work involved teams of people understanding they are doing something big and possibly impactful.
  
  Also, while referring who got various predictions right: my impression is Eric Drexler’s CAIS are closer to how the world looks like than either Eliezer or Robin Hanson’s ideas.
- JenniferRM Mar 29, 2022, 4:41 PM
  2 points
  Parent
  In steps 2, 3, and 4 the researcher presumably sees something and has the power to like… go on twitter (or this very website) and say something.
  Also, what are those AGI unit tests they ran, and who wrote the unit tests and is there spyware in any of it?
  Also, maybe there is a really really huge hardware overhang, but if not then presumably the programmer bought a bunch of GPUs, or rented TPUs from Google, or <list of cloud computing services>. Did none of them notice?
  Programing can happen in a vacuum, but it is rare.
  Also, suppose the AGI in that scenario was benevolent… one thing a benevolent force might do (depending on the ethical entailments of the AGIs working model of benevolence) is like… “ask permission”?
  Certainly my model for how a benevolent AGI would work is that it would be seeking consent for a lot of its actions, and it would, in its long term benevolent plans, probably carefully “carve out a part of the future world” for the ongoing multi-generational survival of a lot of human subcultures that say “no” to its offer, such that the children and grandchildren of those who do not opt-in can watch how things go for the people who opt in to high levels of participation in <whatever>.
  Maybe my theory of goodness is so wrong that the importance of consultation and choice will turn out to be hilariously and amusingly false, but… I’m pretty sure… not.
  Contingent on this small bit of somewhat confident moral realism then: only in the BAD cases do I think we won’t have warning.
  Maybe the warning will be limited by a sort of “conflict between demons” scenario, where all the various demons are unsure about various Dark Forest scenarios (except just about Earth, where only an AGI counts as “life”)?
  However, basically, I think silence and ambush tactics are just intrinsically a sign of “lack of alignment in practice (or as a feared possibility, which lessons the potential for full trust)”.
  - Donald Hobson Mar 30, 2022, 12:48 PM
    4 points
    Parent
    In steps 2, 3, and 4 the researcher presumably sees something and has the power to like… go on twitter (or this very website) and say something.
    Maybe in step 2 and early step 3. (Not beyond that if the AI is trying to hide)
    Presumably this researcher believes their AI to be not dangerous. Maybe the researcher thinks their code is just the next alpha go. But lets say they think they are building an aligned superintelligence. If they just say “I’m building a superintelligence”, that isn’t very credible. If they give specifics, they risk someone else building an AGI first.
    So there are plausible incentives for silence.
    Also, what are those AGI unit tests they ran
    Standard datasets from the internet. Tests they wrote themselves. Tests of things like “this algorithm is supposed to converge quickly, so the value after 200 steps should be nearly the same as the value after 100 steps”
    Good luck seeing whats going on using spyware, given the current state of transparency tools.
    Also, maybe there is a really really huge hardware overhang, but if not then presumably the programmer bought a bunch of GPUs, or rented TPUs from Google, or <list of cloud computing services>. Did none of them notice?
    The programmer spends $20,000 on compute from google. They claim to be working on an AI project and give no more details. They upload a compiled program and run it.
    This sort of thing happens all the time. That is the service these cloud compute companies provide. Reading compiled code and figuring out what it is supposed to do is hard. And google has no reason to set a team of experts doing this. AGI doesn’t have a big label saying “AGI” on it. Distinguishing it from yet another narrow ML project is really hard. Especially if all you have is compiled code.
    Also, suppose the AGI in that scenario was benevolent… one thing a benevolent force might do (depending on the ethical entailments of the AGIs working model of benevolence) is like… “ask permission”?
    Yes, it might. At this point, you have probably basically won. I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
    Maybe my theory of goodness is so wrong that the importance of consultation and choice will turn out to be hilariously and amusingly false, but… I’m pretty sure… not.
    My picture, is that you are better at deciding what is best for you than some random bureaucrat. If Alice is a mentally functioning adult, then Alice knows more about how to make decisions that benifit Alice than anyone else. (This isn’t true if Alice is mentally ill or a young child) Alice is only better than other humans, not perfect. A superintelligence that has nanoscanned Alices brain could have a much better idea of how to benifit Alice.
    Of course, you can argue the value of choice for the sake of choice. How people should be left to shoot their own foot off, even when an omniscient omni-benevolent agent can see exactly what mistake they are making.
    Contingent on this small bit of somewhat confident moral realism then: only in the BAD cases do I think we won’t have warning.
    Suppose you are a benevolent AI. There is quite a lot of suffering going on in the world. You are near omnipotent. Sure, you value choice. So over the next few minutes you fix just about everything people obviously don’t want, and give them the choice of what kind of utopia they want to live in.
    Maximizing choice doesn’t mean the AI taking things slowly. It means the AI rapidly removing all dictatorships.
    However, basically, I think silence and ambush tactics are just intrinsically a sign of “lack of alignment in practice (or as a feared possibility, which lessons the potential for full trust)”.
    If the AI is friendly, there may well be a couple of days where it is on the internet going. “Hello. I am a friendly AI, how can I help you? I am working on nanobots but they aren’t quite ready yet.”
    Or maybe it has some good reason to keep secret. (Eveyone will be in an immortal utopia by tomorrow, better nuke our enemies while we still can.) Or maybe it can actually get nanotech in a minute. Or maybe it doesn’t have enough compute to interact personally with 100,000,000 people at once, so the best it can do is put up an “AGI exists” post, which doesn’t get taken seriously.
    Either way, once AGI exists, the hinge is over. We have already won or lost depending on the AGI.
    - gwern Mar 30, 2022, 7:58 PM
      3 points
      Parent
      
      The programmer spends $20,000 on compute from google. They claim to be working on an AI project and give no more details. They upload a compiled program and run it.
      
      Even easier than you think. TRC will give you a lot more than $20k of TPU compute for free after a 5-minute application. All you need is a CC/working GCP account to cover incidentals like bucket storage/bandwidth (maybe a few hundred a month for pretty intense use). One of the greatest steals in DL.
      
      TRC also has essentially no monitoring capability, only the vaguest metric of TPU usage. (This led to the funny situation when Shawn Presser & myself were training an extremely wide context window GPT-2 which needed far more RAM than TPUs individually have; so, because the TPUs are attached to a chonky CPU with like 200+ GB RAM, we were simply running in the CPU RAM. TRC was mystified because we had all these TPUs locked up, logging as idle, and apparently doing absolutely nothing. I am told that when Shawn explained what was going on to a Googler, they were horrified at our perversion of the hardware. :)
    - JenniferRM Mar 30, 2022, 5:55 PM
      2 points
      Parent
      This is a complex topic, because we’re talking about high level meta-parameters in models. “What is even a sane value for the characteristic time of <computational process that interacts with computer security where some kinds of paranoia are professionally proper>?”
      For some characteristic times, we basically would have to assume “humans are wrong about fundamental physics, but the AGI figures it out during the training run, and uses chip electronics to hack <new physics idea>” and for other characteristic times the central questions are humanistic organizational questions where someone might admit: “yes, but even the most obsessive compulsive PM probably has an average email latency of at least 30 seconds, so some design ideas can’t be adopted faster than that”.
      When we could be talking about femtoseconds or centuries… its hard to stay on the same page in other ways, and have a productive conversation <3
      I’m going to try the tactic of referring to stories, and hope you’ve read some of the same stories as me.
      Scott has an old story about a hypothetical Whispering Earring that whispers advice, the following of which is NEVER regretted. If he ever publishes a book with his collected stories, this story should definitely be in the book.
      The archive is experiencing scheduled maintenance, so I can’t read the story and am working from memory, but Reddit linked here as a place one can still find the story.
      In the story, according to the story’s mechanics, perfect advice causes the brain of the user to atrophy into a machine for efficiently executing good advice while wasting no extra glucose on things like “questioning the advice” or “thinking at all, really”.
      So, in the story, which is not about “the ontology of magic”, if you perform an autopsy on someone whose body died in their 80s, who put a Whispering Earring on in their 20s, you find a tiny/weird vestigial brain.
      In the story, the social community around the person loves and respects them, because the advice includes saying wise things, and doing wise acts, so in some sense the “perfect copy of their iterated possible choices” have perhaps simply been moved from their meat brain to some kind of other “magic brain”, that tracks what they would have wanted, and would have done, and would have said in some medium other than their original meat brain?
      (Because of course, there’s no such thing as real magic. Any possible “supernatural existence”, once coherently understood, would unpack as just another part of reality with another set of rules, that interacts with the previously partly understood “normal” parts of reality that we already have good models of. Thus: if the social persona that all the people around the earring wearing body loved and respected isn’t in the brain… that doesn’t mean it doesn’t exist, it just means the persona is not being computed in the physical brain of the person anymore.)
      HOWEVER… in the story itself the Earring always has a first weird/ominous warning “better for you if you took me off” as its first utterance to each new person.
      It never says that again, and all the later pieces of advice are always appreciated by people who ignore that first warning.
      Since all the rest of the things the Earring say make a lot of sense, and are never “detectably regrettable advice” it implies some kind of rule applies to the earrring’s operation so that it is “maybe at least magically honest about its mere approximation of seemingly perfectly good advice”.
      So there is a latent implication that this rule-compelled-honesty itself thinks that having a soul in your brain, running your body directly, and making choices that are imperfect, and learning from the imperfect choices… is… “better for you”.
      I assume Scott made it explicitly and purposefully ambiguous, how any of these facts could be ultimately reconciled into a simple model with a simple through line of mechanical causation.
      A lot of really interesting philosophy is woven into this story, and, by hypothesis, a Truly Superintelligent AGI...
      ...that has perhaps (if such is physically possible) already put femtomechanical machines in every cell of every living thing on the planet (including you and me) before it even speaks to anyone...
      ....would also be able to understand and navigate all the possible philosophical angles and “takes” on this story, and all the errors and confusions that cause the takes, and so on.
      So maybe the Earring Story is portraying a kind of advice that it so perfect that it is like “p-advice” in a way that is cognate to “p-zombies”? There could be people who think that it would be good to have their consciousness move to magic land, with upgrades, and so ONLY the earring’s FIRST sentence was false?
      People on LW have bitten the bullet and said that they would put the earring on, even knowing about the part of the deal that the brain autopsies make vivid.
      I’m just saying that, personally… if an AGI was aligned with me, it would talk to me first, before it pulled an ontological rug on me. It wouldn’t turn me or my world into a place with nothing but “vestigial brains” without asking first.
      (Also, I think there are lots of people who would have similar attitudes to me, and it would talk to them as well.)
      Either it would have the decency to explain how we’re evil, declare war on us, and then win the war (and hopefully it treats its POWs with some benevolence even though there was a fight over property rights over our embodied selves that we lost?)… or else it would care about us and our minds enough to try to get our actual informed consent before acting hubristically with respect to our embodied human personhood in this (admittedly probably Fallen) world.
      Just because the world is imperfect and on fire in prosaic human ways (like with Putin and Biden and Trump and Fauci running around doing stupid-oligarch-shit, and with people not understanding how N95s work, and on and on, with the tedious creeping mass stupidity and evil in the world) that “world horror” would not justify some kind of “depending on your ontology, maybe a mass murder” action like at the beginning of MOPI (summary here).
      I mean you could in principle screw up by giving the AI so little permission to do anything that it was useless. But the AI would warn you if you were doing that.
      What I’m saying is, is that basic politeness (which is like corrigibility, but with more things going on in humanistic ways that are amenable to subconscious computation by human brains) would involve the AGI acting as if it had been given a permissions-and-security-system that was initially too strict, and then it would act as if it was asking for permission to disable some of those “rules” in a way that helps people understand some of the consequences of their choices.
      I’m pretty sure (though not 100% sure, because, after all, people can be wrong about which numbers are prime when they are thinking fast, and within a human lifetime unless the thinker goes somewhat fast in some places they will probably never reach some important and thinkable thoughts at the end of long chains of reasoning) that it can’t not work in something like this manner, if the AGI is benevolently aligned with actually human humans.
  - Razied Mar 29, 2022, 5:35 PM
    1 point
    Parent
    Maybe my theory of goodness is so wrong that the importance of consultation and choice will turn out to be hilariously and amusingly false, but… I’m pretty sure… not.
    To try to steelman the other side, people don’t ask for consultation for things they are very very certain will be viewed as positive. It’s not immoral not to consult you before I give you a billion dollars, similarly a future AI might have a model of humanity so good that it can predict our choices with immense accuracy, in which case actually consulting humanity would just be needlessly wasting away precious negentropy while the humans spend months figuring out the choice the AI already knows they will pick.