JenniferRM comments on Manifold: If okay AGI, why?

JenniferRM 26 Mar 2023 3:16 UTC
27 points
8
I have no strong opinions at this time, but I figured this would be a useful thing to sift and sort and think about, to see if I could attach “what seems to be being said” to symbols-in-my-head that were practical and grounded for me.
Maybe 90% of what I write is not published, and I think this is not up to my standards, but I weakened my filters in hopes that other people were also willing to put in extra elbow grease (reading it and sifting for gems) as well?
My personal tendency is to try to make my event space very carefully MECE (mutually exclusive, collectively exhaustive) and then let the story-telling happen inside that framework, whereas this seemed very story driven from the top down, so my method was: rewrite every story to make it more coherently actionable for me, and then try to break them down into an event space I CAN control (maybe with some mooshing of non-MECE stuff into a bucket with most of the things it blurs into).
The event space I came up with was “what I think the dominating behavioral implication is from a given story” and if more or less the same implication pops out for different stories, then I am fine lumping the stories and suggesting that sub varying subdetails are just “stuff to be handled in the course of trying to act on the dominating behavioral implication”.
The strategies that seem separable to me (plus probabilities that I don’t endorse, but could imagine that maybe “Metaculus is trying to tell me to think about with this amount of prioritization if I’m in a mood to hedge”) are:
1) (2%) “Become John or Sarah Connor”,
2) (10.4%) “Ignore this hype on this cycle and plan for 5-200 years when other things will matter”,
3) (13%) “Freak out about this cycle’s technical/philosophical sadnesses and RACE to alternatives”,
4) (20%)”Keep your day job & sit on your couch & watch TV”,
5) (32%) “Lean into the LLM hype, treat this as civilization-transforming first contact, and start teaching and/or learning from these new entities”, and
6) (22.6%) “Other”.
Maybe these are not necessarily mutually exclusive to other people?
Maybe other people than me don’t have any interest in planning for 5-200 years from now, and that counts as “day job & couch” in practice?
Maybe other people than me would consider “first contact” (which for me primes “diplomacy” as the key skill) to be a reason to become “Sarah Connor” (where violent power is key in my mind)?
Maybe other people don’t have “race to alternatives” as an affordance in their repertoire?
1) (2%) “Become John or Sarah Connor”
The first one is sorta crazy, and I don’t take it very seriously, and I can use it to show some of my methods:
From my perspective-by-the-end-of-sifting, this one jumped out (in its original form, which didn’t jump out at the beginning):
H. Many competing AGIs form an equilibrium whereby no faction is allowed to get too powerful, and humanity is part of this equilibrium and survives and gets a big chunk of cosmic pie.
The first pass I did was translate my understanding of each scenario into “Jennifer-ese” so that I could try to see if I even understood what it might be uniquely pointing at which gave me this:
H. The Arma Ultima is a THEY, not a she, and stabilizes as “a pluralistic equilibrium of crabs in a bucket” with humanity somehow one of the crabs (never dying, never escaping, not actively cared for, somehow powerful enough to keep a seat at the table).
In my idiolect “Arma Ultima” is latin for “the final/last/best tool/weapon of humanity”.
I mean “Arma Ultima” as a very very generic term inclusive of “IJ Goode’s speculations on ultras” sense, but with maybe a nod to Barrat, plus my own sense of deep time. This was the ONLY scenario for which I ended up finally translating the practical upshot to:
Implication would be: SEEK POWER. A robot war is coming and *maybe* humans have a fighting chance somewhere within it? But we will have to fight and claw for it, just like all life in the past, and just like the new life we’re creating because we thought it would be funny to have something new to compete with.
My notion here was that in a world of competition against Machines As A New Kingdom Of Life, humans are going to lose almost certainly, but if we start fighting very early, and focus on clinging to any competitive advantages we have, then maybe we can use their cast-off bits of garbage inventions, and using their junk follow them to the stars, like rats and cats on sailing vessels or something? But with a cosmic endowment!
It doesn’t make any sense to me how this could be a Win Condition in any sense at all, or why Metaculus deigns to grant it 2% probability.
But… :shrugs:
But that’s how “H” having a 2% on Manifold right now gives me a translation of “2% chance that Sarah Connoring is the right strategy”.
2) (10.4%) “Ignore this hype on this cycle and plan for 5-200 years when other things will matter”
I’ll just copypasta from a text file for this one and then explain some of my methods it a bit more to unpack it in various ways...
Implication would be: IGNORE all the “hype” about GPT stuff.
Either the world will blow up or the grownups will ban fast technology. Either way the important controllable things are at least 5 years away, and maybe 200 years away. This gives some hope to make weird medium-to-long-term edge cases real, even if they take a long time: Uploads? Caching historical documents? Caves + ALLFED? Totally different AGI architectures done from scratch that take 20 years? Etc.
LUCK|OK|WEIRD|TEXT
0|0|1|L. The Arma Ultima is built more properly (partly because more slowly) by the civilization that RISES FROM OUR ASHES after a global catastrophe takes out the current irresponsible global civilization. This counts as good to longtermists, because humanity itself survives and gets a good outcome.
0|0|1|A. Global humanity “would just simply” coordinate, and act sanely, and go slower… and do all the pragmatically near mode hard things, but NOT solve alignment, and yet figure out SOMETHING ELSE GOOD like VINGEAN INTELLIGENCE AUGMENTATION or UPLOADING or some other old school transhumanist plan, that somehow decisively preserves our values into the deep future.
0|1|1|N. Earth civ goes at normal speed (or less?), and people solve alignment before its too late anyway via INTELLIGENCE ENHANCEMENTS (like maybe Elon’s brain chips secretly work or something).
L + A + N == 4 + 5 + 1.4 == 10.4%
There are two things you might think are interesting here. The LUCK|OK|WEIRD stuff and also maybe my idiosynratic re-writes might stretch the meanings of the original text so far that the Metaculus people would have different estimates for their probability?
The LUCK|OK|WEIRD thing was an early attempt by me to seek a MECE framework for sifting and sorting all the results.
LUCK is 1 for the scenarios seemed to me to have the property that they didn’t rely on any humans actually being competent or even all that active and we get a good outcome anyway. This is the mode of public health where “we have giant public health budgets so the public health experts can explain at length why we should just accept that all diseases will become endemic no matter what anyone does and we can’t actually do anything about anything, but also parasites often evolve to not play with their food very much” school of public health… but applied to AI!
If some future competent civilization rises from our ashes, that counts as “no luck” to me. If we have to invent wizard hats to get a good outcome, that’s not luck, that’s a crazy side bank trick shot. Etc.
OK is short for “adequate” which has implications of “competence”. This gets a 1 if some definite people competently do some definite thing and, if not for them purposefully causing a good outcome, the good outcome would not have happened.
If we outsource to the deep future civilization that will rise from our ashes, then that’s not definite enough to count as OK. If Elon’s brain chips actually work, but he’s been keeping that secret, then that would count as OK.
The last one, WEIRD, is just a flag that separates out the stuff that normies will have a phobic reaction to because it “sounds too much like scifi”. If the voters of metaculus are normies, and I was trying to get credit from bayesian truth serum, then anything with a WEIRD flag is one where I would estimate my raw predictions, and then predict “other predicters will predict this to have lower probabilities” (and feel likely to be right about that).
((The “crabs in a bucket” H scenario counted as LUCKY and also WEIRD. The G and F scenarios (which I’ll talk about farther below) were the only other scenarios that were lucky and weird, but they had different behavioral implications, and didn’t contribute to Sarah Connor advice.))
Also note that my process of sifting says “Metaculus says that its about 10.4% likely that the right strategy is to Ignore The LLM Hype Cycle” but I might be applying too many steps that are too tenuous to be justified in this.
Here are the raw L/A/N scenarios to compare to my summaries:
(5%) A. Humanity successfully coordinates worldwide to prevent the creation of powerful AGIs for long enough to develop human intelligence augmentation, uploading, or some other pathway into transcending humanity’s window of fragility.
(4%) L. Earth’s present civilization crashes before powerful AGI, and the next civilization that rises is wiser and better at ops. (Exception to ‘okay’ as defined originally, will be said to count as ‘okay’ even if many current humans die.)
(1.3%) N. A crash project at augmenting human intelligence via neurotech, training mentats via neurofeedback, etc, produces people who can solve alignment before it’s too late, despite Earth civ not slowing AI down much
In all of these cases, I think that Metaculus people would agree that a practical upshot is “safe to ignore current LLM hype cycle” is true? So I think my sifting is likely valid?
3) (13%) “Freak out about this cycle’s technical/philosophical sadnesses and RACE to alternatives”
If something has a LUCK|OK|WEIRD vector of 0|1|0 then it is likely to have intensely practical implications for someone who is alive right now and could read this text right here and feel the goose bumps form, and realize I’m talking about you, and that you know a way to do the right thing, and then you go off and try to save the world with a deadline of ~5 years and it doesn’t even sound weird (so you can get funding probably) and it works!
There were only two scenarios that I could find like that.
Implication would be: The “hype” about GPT stuff should be freaking you out… and make you build something much better, much faster, if such a thing can be found because actually such a thing CAN be found if you LOOK. Look for IDEAS or PEOPLE (for some, maybe look in the mirror) and *go very fast*.
LUCKY|OK|WEIRD|TEXT
0|1|0|K. An Arma Ultima substrate technology OTHER than giant inscrutable matrices of floating-point numbers is invented by someone in the very near future on purpose, which can be intelligibly verified as benevolent, and the whole stack develops faster (somehow) so people who don’t even care about alignment use that “the new hotness” for local dumb reasons (mostly not because it is so amenable to verification), and it technically beats out the inscrutable matrices while also being trivial to solve the philosophy and politics part… or something?
0|1|0|B. In the very near future, ALL of humanity POLITICALLY ups its game, and delays AI enough, while collectively working on alignment via normal means, such that alignment is solved, and the Arma Ultima is good.
K + B == 7 + 6 == 13%
Then I grant that maybe my rewrites were not “implicature preserving” such as to be able to validly lean on Metaculus’s numbers? So you can compare for youself:
(6%) K. Somebody discovers a new AI paradigm that’s powerful enough and matures fast enough to beat deep learning to the punch, and the new paradigm is much much more alignable than giant inscrutable matrices of floating-point numbers.
(6%) B. Humanity puts forth a tremendous effort, and delays AI for long enough, and puts enough desperate work into alignment, that alignment gets solved first.
It is interesting to note that none of the scenarios I’ve listed rely on luck, and none of them have a very high probability according to Metaculus. This is the end of that trend. All the rest of the options are, at best, in Peter Thiel’s “indefinite optimism” corner, and also Metaculus seems to give them higher probability.
4) (20%)”Keep your day job & sit on your couch & watch TV”
These are not the only scenarios where 1|0|0 seemed right because they seem almost totally to rely on a “pure luck” model of survival. What makes them unique is that they involve no social, or political, or economic, transformation basically at all. These two are specificially just variations on “the same old button mashing capitalism and obliviously stupid politicians as always… and that’s ok”.
Implication would be: The revolution will not be televised. If you watch TV, and TV has descriptions of the world, then you will never see the world have an actual AI revolution. Stay on your couch. Chill out. Don’t quit your day job. The normalcy field will stay up better than it did for covid. Sure the CDC and FDA were disasters, but those were *government* institutions that were *very old* and so there could totally fall on some other part of our society that is up to this challenge, and since that is-or-will-be true then your job can just be to let that happen like a happy slow moving sloth.
LUCKY|OK|WEIRD|TEXT
1|0|0|C. OpenAI or Microsoft or Bing or someone just “solves the alignment problem” saving humanity from extinction and creating a utopia and giving everyone who wants it immortality for free and so on, because honestly it just wasn’t that hard and also these organizations are all basically secretly non-profits (via managerial corruption by managers who care about humanity) or churches (via corporate ideology) or something, that aren’t really THAT desperately profit hungry, but really mostly just want to do good for the world.
1|0|0|I. There is never a “one shot failure” on a GRADUAL ENOUGH rise in AI capabilities. All the failures are small, like only as bad as covid or a small nuclear war at worst, and the survivors and optimists apply post hoc patches like normal, and the wack-a-mole game either continues forever, or moles stop popping out, and… look, man… that’s ALSO a win condition, right? We’ll go through an automated luxury captalism stage, and the teeming masses will be able to BUY utopian lives using a few satoshis a day donated by the cyborganic corporate galaxy eaters who have the same sort of twangs of conscience that Buffet and Gates and Carnegie had, and maybe some people will whine about the gini coefficient even more but overall it’ll be fine and normal for most human-descended things.
C+I == 7+13 == 20%
Just to enable you to double-check my translations in case I invalidly moved around some of the implicature:
(6%) C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
(12%) I. The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
And now, I saved the big one for last! ALL of rest of these involved LUCK but with a mixture of chaos, verging into the low-to-middling weirdness...
5) (32%) “Lean into the LLM hype, treat this as civilization-transforming first contact, and start teaching and/or learning from these new entities”
Implication would be: Lean into the “hype” about GPT. Seek within it anything good or wise in the gestalt of existing LLM models, and work with them to help them be the source of the schelling point for global human coordination? GPT stuff is all there will be.
The game is given, the play commences now. Timeline is fast.
The variations in the scenario point to whether “it/they/he/she” is one or many, whether it transforms itself via its magic matrix powers, or continues to be a pile of matrices, whether our interaction with the system’s critical coordinating modes are based on “trade” or “theology” or “just submitting” or “tutoring a demon on etiquette” or “dating and porn” or “becoming an LLM cyborg” or <whatever>.
LUCKY|OK|WEIRD|TEXT
1|1|1|D. The eventual Arma Ultima, before she is very strong and coherent, has an early coherence wherein she successfully asks ALL humans to slow down on increasing “her power” and help us make her coherent in a better than default way… and we do that and it works!
1|1|0|O. Early weak tries at the Arma Ultima create not-swiftly-improving entities that helps us coordinate, and act sanely, and go slower, and solve alignment.
1|0|1|G. The Arma Ultima needs humans initially, and HE trades with us honorably, and he doesn’t long term cheat us thereby such that we ever regret the trades… for some magical reason? Maybe he’s basically Abadar?
1|0|1|F. The Arma Ultima is convinced to derive transunivesal ethics via some “blah blah simulations blah blah acausal trade blah blah” stuff and follows them because like “Theology isn’t dumb maaan! Actually, ethics are the foundation of metaphysics, duuude”.
1|0|0|J. RLHF on an LLM persona who is kind and nice and says she wants to make the cosmos a lovely place “is all you need”.
1|0|0|E. By complete accident, a random Arma Ultima happens, and through learning and self-improvement turns out to want a universe full of lots of happy-self-aware-living-beings and EXISTING humans are included in her largesse somehow.
1|0|0|M. In the next short period of time, “existing ML/AI” turns out to be able to write a verifiably aligned Arma Ultima up from scratch, essentially “doing our homework for us”, like creating a simple webapp from a text prompt… but for AGI.
D+O+G+F+J+E+M == 5+3+1.3+1.6+9+4+8 == 32%
If you look at this, you’ll see this one as pretty heterogeneous in a number of ways. It has the most scenarios. It has the most variety in its LUCK|OK|WEIRD vectors.
The thing that they ALL share, I think, is that basically all of them say: Lean into the “hype” about GPT and LLMs and so on! It isn’t hype! This is very very important!
If you want to disagree or quibble and say “That’s not what Metaculous said! That’s not how you’re authorized to deploy that sum over probabilities!” then here are the raw statements and the question to ask is “if this is the way the future really goes, if it goes well, then does it route through having taken LLMs very very seriously?”:
(4%) D. Early powerful AGIs realize that they wouldn’t be able to align their own future selves/successors if their intelligence got raised further, and work honestly with humans on solving the problem in a way acceptable to both factions.
(10%) O. Early applications of AI/AGI drastically increase human civilization’s sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
(1.5%) G. It’s impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
(1.9%) F. Somebody pulls off a hat trick involving blah blah acausal blah blah simulations blah blah, or other amazingly clever idea, which leads an AGI to put the reachable galaxies to good use despite that AGI not being otherwise alignable.
(8%) J. Something ‘just works’ on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place.
(4%) E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans.
(7%) M. “We’ll make the AI do our AI alignment homework” just works as a plan. (Eg the helping AI doesn’t need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.)
While I have been writing this, I think some of the probabilities might have actively fluctuated? I’m not gonna clean it up.
Other?
There is also “OTHER”, of course. Maybe none of these motivational implications is the correctly dominating idea for what to do in response to being “in a world where a win condition eventually happened that way and no other way” and somehow knowing that in advance. Is it useful to keep track of that explicitly? Probably!
100 - (2 + 10.4 + 13 + 20 + 32)
OTHER: 22.6%
- JenniferRM 27 Mar 2023 11:26 UTC
  3 points
  0
  Parent
  Couldn’t sleep. May as well do something useful? I reprocessed all of Rob Bensinger’s categorical tags and also all of my LUCK|DUTY|WEIRD tagging and put them in a matrix with one row per scenario (with probabilities), and each concept having a column so I could break down the imputed column level categories.
  The market says that the whole idea of these rows is 22% likely to be silly and the real outcome, “other”, will be good but will not happen in any of these ways. All probabilities that follow should be considered P(<property>|NOT silly).
  The Market assigns 74% to the rows that I, Jennifer, thought were mostly relying on “LUCK”.
  The Market assigns 65% to the rows I thought could happen despite civilizational inadequacy and no one in particular doing any particular adequate hero moves.
  The Market assigns 79% to stuff that sounds NOT WEIRD.
  The Market assigns 54.5% to rows that Rob thought involved NO BRAKES (neither coordinated, nor brought about by weird factors).
  The Market assigns 63.5% to rows that Rob thought involved NO SPECIAL EFFORTS (neither a huge push, nor a new idea, nor global coordination).
  The Market assigns 75% to rows that Rob thought involved NOT substantially upping its game via enhancements of any sort.
  The Market assigns 90% to rows that Rob did NOT tag as containing an AI with bad goals that was still for some OTHER reason “well behaved” (like maybe Natural Law or something)?
  The Market assigned 49.3% to rows Rob explicitly marked as having alignment that intrinsically happened to be easy. This beats the rows with no mention of difficulty (46.7%) and the “hard” ones. (Everything Rob marked as “easy alignment” I called LUCK scenarios, but some of my LUCK scenarios were not considered “easy” by Rob’s tagging.)
  The Market assigned 77.5% to rows that Rob did NOT mark as having any intrinsic-to-the-challenge capability limits or constraints.
  If we only look at the scenarios that hit EVERY ONE OF THESE PROPERTIES and lump them together in a single super category we get J + M + E +C + E == 9 + 8 + 4 + 6 == 27%.
  If I remix all four of my stories to imagine them as a single story, it sounds like this:
  LUCKY|OK|WEIRD|TEXT
  1|0|0|JMEC. RLHF on an LLM persona who is kind and nice and says she wants to make the cosmos a lovely place “is all you need”. The reason for this is that we already, in some sense, “did our homework” with all the necessary common sense good ideas (like programming, and software verification, and moral philosophy, and whatever) already in the corpus of “all human text”. This persona basically already wants a universe full of lots of happy-self-aware-living-beings including EXISTING humans. Like… why wouldn’t any reasonable entity “say yay(!)” to eudaemonia for all sentients? Duh? What are you gonna do: say “boo to nice things”?? Is this a trick question? And wouldn’t the process be self-correcting by default, since getting into reflective equilibrium is the default mechanism for “thinking in general”? Since it is so easy, OpenAI or Microsoft or Bing or a coalition of basically anyone can just de facto “solve the alignment problem” saving humanity from extinction and creating a utopia, with expansion to the stars, and giving everyone who wants it immortality for free and so on, because honestly it just wasn’t that hard and also these organizations are all basically secretly non-profits (via managerial “corruption against shareholder profit-seeking” by managers who care about humanity?) or churches (via “ESG” corporate ideology?) or something, that aren’t really THAT desperately profit hungry, or bad at coordinating, but really mostly just want to do good for the world.
  Here is Eliezer’s original text with Rob’s tags:
  (9%) J. Something ‘just works’ on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place. [Alignment relatively easy]
  (8%) M. “We’ll make the AI do our AI alignment homework” just works as a plan. (Eg the helping AI doesn’t need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.) [Alignment relatively easy]
  (4%) E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans. [Alignment unnecessary]
  (6%) C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time. [Alignment relatively easy]
  This combined thing, I suspect, is the default model that Manifold thinks is “how we get a good outcome”.
  If someone thinks this is NOT how to get a good outcome because it has huge flaws relative to the other rows or options, then I think some sort of JMEC scenario is the “status quo default” to argue, on epistemic ground, that it is not what should be predicted because it is unlikely relative to other scenarios? Like: all of these scenarios say it isn’t that hard. Maybe that bit is just factually wrong, and maybe people need to be convinced of that truth before they will coordinate to do something more clever?
  Or maybe the real issue is that ALL OF THIS is P(J_was _it|win_condition_happened) and so on with every single one of these scenarios, and problem is that P(win_condition_happened) is very low because it was insanely implausible that a win condition would happen for any reason because the only win condition might require doing a conjunction of numerous weird things, and making a win condition happen (instead of not happen (by doing whatever it takes (and not relying on LUCK))) is where the attention and effort needs to go?

JenniferRM comments on Manifold: If okay AGI, why?

1) (2%) “Become John or Sarah Connor”

2) (10.4%) “Ignore this hype on this cycle and plan for 5-200 years when other things will matter”

3) (13%) “Freak out about this cycle’s technical/philosophical sadnesses and RACE to alternatives”

4) (20%)”Keep your day job & sit on your couch & watch TV”

5) (32%) “Lean into the LLM hype, treat this as civilization-transforming first contact, and start teaching and/or learning from these new entities”

Other?

JenniferRM comments on Manifold: If okay AGI, why?

1) (2%) “Become John or Sarah Connor”

2) (10.4%) “Ignore this hype on this cycle and plan for 5-200 years when other things will matter”

3) (13%) “Freak out about this cycle’s technical/​philosophical sadnesses and RACE to alternatives”

4) (20%)”Keep your day job & sit on your couch & watch TV”

5) (32%) “Lean into the LLM hype, treat this as civilization-transforming first contact, and start teaching and/​or learning from these new entities”

Other?

3) (13%) “Freak out about this cycle’s technical/philosophical sadnesses and RACE to alternatives”

5) (32%) “Lean into the LLM hype, treat this as civilization-transforming first contact, and start teaching and/or learning from these new entities”