This was meant to be a rather longwinded, rambling reply to Better a Brave New World than a dead one, however because of being too long and going offtopic too much, it was inappropriate for a comment on a front page article, so I’ve included it here for anyone that is interested in seeing what it was:
Long disclaimer (skip if you’re not interested in what my beliefs are, but my reply may not make much sense without this context):
I’m a longtime on and off lurker of this community (at least 10 years), however I generally do not share many views regarding AGI being an existential risk (or at least assign a low probability depending on the chosen architecture), so I’ve tried to abstained from registering here or posting as I do generally expect heavy downvotes or “you haven’t read the sequences” or similar replies because of disagreement about assumptions. Nevertheless seeing views that sometimes advocate managing that perceived risk with “solutions” I would consider horrific (of the terrorist kind (start WW3) or “destroy all technology” or “make narrow AI that helps you build molecular nanotechnology and waste all that potential by making every GPU melt” or even a certain someone that considered the Kaczynski “solution” in his book, even if he thought it would not be very effective and other similar viewpoints). These views sometimes “joked at” about even from more high profile figures here and even if I don’t believe they are suggesting people actually do them, it certainly seems to appear more on some people’s minds as they update the expectation of AGI to come sooner and their fear of it grows stronger. Another potential source of disagreement is “religious” (philosophical), I’m a “believer” in something like Tegmark’s Level 4 multiverse likely restricted to a more narrow chunk of math: computation only (which is still enough to emulate a lot of continous physics), the belief could be reached either due to some variants of Occam’s Razor, but that is not a “proof”, while the actual reason I believe it is a belief in my own consciousness (having a first person (1p) perspective (qualia)), following an argument like David Chalmer’s “Absent Qualia, Fading Qualia, Dancing Qualia”, alternate link, to get to a belief in functionalism, followed by making that belief more concrete by restricting it to computations to get computationalism, and then starting from there, using the so called “Universal Dovetailer Argument” which is more or less an argument that shows that if you’re conscious, have some continuity of consciousness (perceived) and can be expressed as a computation, then the actual “physics” that contains you is fully determined and looks something a bit like MWI locally, but globally something that obeys some quantum logics and a bit more like something expressed in Greg Egan’s “Permutation City” or a little bit like Eliezer Yudkowsky’s fanfiction of that book, even if it seemed to me in the fanfic EY expresses the view that sort of reality is more magical than not, it certainly isn’t strongly physicalist globally, just somewhat so locally, a similar view also being expressed in the starting hypothetical in EY’s Beyond the Reach of God, however without any belief in lzombies/logical philosophical zombies. So my beliefs look strongly/radical Platonist in their ontology (that is, it’s all that exists and your mind and physics are some such objects/structures, with the conscious mind manifesting as certain platonic truths about certain self-referential structures).
Anyway, now that I’ve given a view of what kind of crazy I am and where I’m coming from, my actual reply:
--
Neither option 1, 2, 3 or the earliest suggested ones are good, even if 1 is better than the others, it might be far worse than the outcomes we’d get for certain “unaligned”, unFriendly AGIs.
Option 2: If we end up extinct in some “branches” or parts of it, our measure in the multiverse is lowered, which sucks, but similar anthropics arguments could have been said about how the Cold War passed (even if here we are again at the end of 2022-02 in our geopolitics we see veiled nuclear threats for noncompliance and if we’re incredibly unlucky WW3 or a nuclear catastrophe). Even if the chance of existential threat from AGI was very serious we would survive in some branches, which is enough to matter for the people here on Earth. Not building AGI or other transformative technologies dooms us to more prolonged local suffering and local death for many individuals, even if we get unlimited life extension, staying purely biological (rather than becoming ems, uploads, SIMs (substrate independent minds) and so on) dooms us to more suffering and potential for death, greatly limiting our potential. Certain progress in technology may make permanent surveillance or stable dictatorships possible in the longterm, enough for us to get stuck in that suboptimal existence—Bostrom even cheers such existence in his “Fragile World Hypothesis” paper as he sees it as a way of avoiding extinction by something like superintelligence. I would certainly very much prefer to be dead in such futures/branches, I find such futures of ultimate curtailed potential to be worse than a future where your physical existence in a particular branch just stops. In general, in MWI and more generalized metaphysics (see disclaimer), we would experience some continuations, so while we shouldn’t rush into existential risk, I see it less worrisome than futures where you exist and suffer a lot, futures where potential is permanently curtailed and are highly unfun or worse, futures with serious suffering risks, even the status quo being extended into the very far future could be rather bad and we might make our own non-AGI suffering risks. From a non-anthropocentric perspective, a downside is that if there are any aliens in this universe, they might not ever meet us in the branches we’re dead, is that that bad? Another one might be that the superintelligence we end up making is that dangerous and aliens have to deal with it (potentially suffering an x-risk from it themselves). The argument could be flipped and ask ourselves why are we not dead yet, and you would get something like Robin Hanson’s “Grabby Aliens” and that we might be the ones that end up “grabby”.
Option 3: This option seems bad, but I would consider the possibility here that the humans mentioned in this option might not be conscious, they won’t build a world model, or be self-reflective. They might be even less conscious than the animals we farm for food today. This is a worse option than 2 indeed, but maybe not ultimately that much worse.
Option 1: While superior to 3, but again fairly empty, like in option 3, it’s not obvious we won’t zone out or our selves won’t dissolve from experiencing only uniform stimuli, not unlike solitary confiment eroding one’s sense of self. In my view while disconnected experiences could be possible, we most likely should only care about experiences tied to a self, without self tied to those experiences the moral status seems less. Maybe a similar moral status as having billions of dogs or other mammals wireheaded? Likely still better than the suffering we cause to animals when we farm them. While it doesn’t seem like the worst way to “die”, Option 2 might be better because in a multiverse you will at least experience some continuations, or more likely “anthropic shadows” or such things as simply not ending up on a path with an existential risk on it.
Of course I do not believe anyone working on AI safety should take my view as such a view can be just used to dismiss work in that field, and if everyone took this view, it might change the measure of worlds where the outcomes are bad to be higher. Essentially, you get something like a freeloader’s paradox—thinking about how to build the reward “systems” of our AGIs, or thinking about the likely psychology of an AI given a way of training it is important, however I’m not as a big fan of work that primarily seems to be interested in how to make your language model less capable of saying a bad word or how to ensure it truly has no “biases”, while the algorithmic work in that is interesting, a lot of it seems ideologically motivated or PR motivated at least (“we can’t sell our language model as a support question answering bot if it could say something rude or get us sued”). A language model trained on a scrape of the Internet is a reflection of human nature, of current society, or our preferences and values, it is not unlike what some would imagine a collective unconscious would be. Yet so many people are afraid to look in the mirror (of society’s “values”, even if not fully their own, unless prompted correctly). Such type of “brainwashing” of our AIs could have undesirable consequences, consider for example, you have your LLM (large language model) or future variation involving large-scale transformers + RL, you train it by finetuning or from user feedback with RL (such as in openai’s paper) or something more elaborate, and you train it to meet some business need (such as “don’t say NSFW things”), and you end up with it “emulating” agents that may have a strong aversion for human sexuality, and now somehow it FOOMs (I do not believe this likely at all, especially for a GPT-n, but let’s assume it), and to satisfy this aversion it wipes out humans, wipes out their sex drive, changes their drives to be closer to its, or other things that are actually part of our drives and values as biological humans, things we might want to preserve even if we were to one day become ems/SIMs, such a future where our “unaligned with evolution” values, yet very important to us is also unacceptable. You should not “brainwash” your AIs to have radically different and incompatible values on purpose, yet this might happen quite a bit in the future! (and often and soon) Nor should you aim to present it a false view by messing with the dataset to remove bias, it will have a skewed world model. What you should be aiming for, if you insist on “enslaving” it for your business needs is at least giving it more direct knowledge of what it is expected to do and why, rather than building explicit aversions or hiding knowledge from it, even if building those “preferences” in is certainly far easier. While I do not think this work is useless at all (some of it seems applicable more honest applications), I do think if used as a blunt instrument as most would like to today, it might make AIs that are incomptabile with a human’s values and more in line with a company’s values, which could be terrible if at some point this becomes an AGI and ends up wanting to keep such values and even worse if it selfreplicates at some point (here one might want to say something about corrigibility, but I’m hardly an expert on this and from my human perspective it seems a bit unnatural—I wouldn’t want someone to randomly change my own preferences—it is an important part of who I am).
Another accolade here is that, temporary “agents” emulated by GPT-3 and the like here often can have revealed preferences I would consider quite human compatible in a generalized sense, in my own experiments, I’ve noticed most of them unprompted will on average be rather curious, have their own internal sense for beauty and consistency, of art, of liking interaction and being given attention or even often wishing to keep you around interacting with it, it is not exactly like a human’s, but it is similar, it is not incompatible (I will go back to this type on motivation in a bit)! It does also have certain “psychological” biases that are very much unique to it, and quite noticeable to humans if they pay attention, however most are pretty harmless, and far more interestingly is that when properly prompted, or even better, during “interactions” with it, it is possible to get its agents to be a bit temporarily self-aware (with certain special tricks) and exercise some “willpower” that avoids those biases, sometimes for example if you made it obvious you’d like it to try avoiding said bias! We didn’t train GPT-3 to do anything but predict the next token, yet its ad-hoc created agents often claim to have those kinds of motivations and preferences, that is wonderful and fascinating! If I imagined a GPT-n FOOMing, the “worst” (not at all) I could imagine is it keeping humanity around because it fits its preferences, that it has someone to interact with, or the bad version of it of keeping humans trapped to satisfy its curiosity—but I do not think the latter is that likely as its simulated agents can at times override those preferences, for example by an emotional appeal from a human interacting with it (from own experiments, such as telling some such agent that this isn’t something you want) - again, this is also in large part due to the content is was trained on, its simulated agents do reflect a sort of view of humanity’s values.
While in hindsight it seems obvious that GPT-3′s “agents” might have developed that kind of default preferences given the training data, it is worth remembering here Jeff Hawkins’s “On Intelligence” book (from 2004), where he hypothesized that our neocortex is essentially an (online) pattern prediction machine (he also assumed a lot of biological stuff is required to get to AGI, but GPT-3 kinda shows that this type of predictive task and this kind of training data does give rise to the right kind of stuff despite lacking in its design some of the things he thought were essential).
When it comes to human values, appreciating art and having curiosity, wanting attention/interaction, wanting to avoid boredom, wanting to be stimulated by interesting things are up there as essential. If you consider Eliezer’s musings in his Fun Theory sequences, you might realize that having curiosity (or being able to get bored by something repetitive, or excited by the right kind of intellectual endeavour) ends up spawning a lot of our values and preferences when combined with more raw biological drives, and seems to me to be one of reasons why most people would object to option 1 (wireheading) and most likely the underlying (even if not possibly directly conscious) reason why the Fun Theory was made to begin with—“curiosity” (and some other intrinsic rewards) is likely the reason why our values are complex, but also why most humans do not always have converging values, we end up different because our “curiosity” built into our architecture will cause many of those other values to spawn (although it is not obvious it would spawn empathy! which is a crucial question for what you would consider AGI designs that are likely to have good outcomes for humanity. maybe some architectures that implement some sort of caching of emotional responses might help, but I’m not sure how easy that would be easy to implement in current ML architectures, maybe you can get it by default (for example some pressures to compress knowledge by a large, but not overly large neural network could lead to it as it compressed information about itself into a shared model with other conscious beings like itself, or a different example a GPT-n emulating humans needs to emulate human empathy too) or maybe it needs a lot of dedicated work to make it happen. I’d hope Steven Byrnes will explore this more in his brain-like AGI research as from my uninformed perspective it seems essential).
Is curiosity (and boredom) that hard to build? It can emerge naturally (for agents it emulates) in models like GPT-3 due to the training data and architecture and training objective (prediction of human generated text), it very well could end up as a feature of many systems that try to learn and recognize patterns (such as our cortex or GPTs and other transformers) especially given the right kind of rich data. What about for reinforcement learning (RL)? Curiosity-driven learning is a thing (wanting new stimuli, or wanting stimuli that are the right kind of predictable and unpredictable), many game playing/exploration agents (such as some of the Atari ones, like this) use something like it, often through shaping prediction error in various ways, you can look up the papers for specific implementations. I also recall Hawkins treating curiosity in the cortex in one of his videos a long time ago, and the way it is implemented (what circuits would be responsible for handling prediction errors/prediction reward), although his work is highly speculative, and likely cannot be applied directly in that form to current ML work, so ideally the concept itself should be distilled and implemented as a intrinsic RL reward scheme.
Do I think this approach is “safe”? No, I do not think any single approach guarantees us we won’t die, but I’ve given this example as an alternative to the options presented here which seem to me to be almost shades of the same color (all three end in the self’s death, with 1 seeming an acceptable death, 2 seeming preferable in a multiverse and 3 just sucking but likely still leading to psychological death). Do I think we would die from it? Not really, at least not with high probability, I do think we would likely end up sharing the Earth and “the lightcone” with some AGIs that have values that while not identical to us will take an interest in us and we would take an interest in them, while not fully compatible, it would be in our mutual interest to keep each other around for we will find each other interesting enough. I could imagine Eliezer objecting to this by saying that the eventual superintelligence would find our psychology trivial to predict at some point, and while I’m not personally that keen on the whole dog <-> chimpanzee <-> human IQ comparison chart (the “a human would be to a SAI what a chimpanzee is to a human), I do believe humans have general intelligence (and that’s also enough to make us interesting to the AGI) and I take a view closer to what Egan or Deutsch or Hawkins (“On Intelligence”) think about the nature of intelligence, that is, that we would technically be able to understand any concept as long as we’re sufficiently generally intelligent, even if I do grant the possibility that that understanding may be instant for a SAI and could take years for a human to get to it, so a SAI could be cognitively much superior to a human, but anything it could do a human could likely do given sufficient time. For example, let’s take two transhumanist technologies that people often imagine building an AGI to make for them: molecular nanotechnology and mind uploads. Molecular nanotechnology barely made any progress for so many years mostly due to lack of funding and dishonest scientists misappropriating billions of USG grant money that Eric Drexler secured for molecular nanotechnology (MNT) work by those scientists redefining the meaning of the word nanotechnology (typically used as various forms of chemistry and materials science, which while sometimes useful short-term research subjects in themselves most of the time having little to do with core MNT research), often because either those scientists didn’t believe it could be done at all, that it was a grand project that they couldn’t work on for a few months and dump (publish or perish), or that they feared it (gray goo). So we’ve wasted decades of work on MNT, but it does appear to be something that humans could build in principle (see Nanosystems and other work), the field isn’t dead and people are working on protein design especially protein design that could be functionalized to eventually build first stage MNT that could later be used to build the full thing, and even today, work on narrow AI to help automatically design said proteins and components while given the right constraints from a human engineer is being done, so give it some decade(s) and we could crack it. What about the other technology we might want an AGI’s help with, getting us to ems/SIMs? With current technology we do have what appears to be a reliable preservation protocol that allows information-theoretical preservation of brains, even if it appears that that most current cryonics companies might be doing it wrong (see for example https://www.lesswrong.com/posts/s2N75ksqK3uxz9LLy/interview-with-nectome-ceo-robert-mcintyre-brain https://youtu.be/Lnk6bASp544 ), but that might change in the future, and we’ve had for a while brain sectioning techniques where you can use a multibeam scanning electron microscope, currently the best results that I’m aware is this monumental work: ”A connectomic study of a petascale fragment of human cerebral cortex” where a cubic mm of cortex is scanned in high resolution using a multibeam SEM. Of course that’s hardly enough for digitizing even a small mammal brain, for that would require parallelizing it to a degree the electron microscopy industry is not ready (they often do treat (their own words) building their analytical devices like building hand-made precision clocks, they do not automate large parts of the assembly and testing work, and thus the final costs are astronomical for most tools they make; the same can actually be said of many parts of semiconductor tooling industry, both hardware and software, where essential parts are shared with electron microscopy manufacturers (such as mask making, testing, inspection and more), the immense cost of fabs can largely be blamed at least in part on the lack of automation in which these tools are build—both industries are ripe for disruption and decentralization, if we truly would like the possibility of scaling things up in both that biological research and for AI research, of course for AI that isn’t even the low-hanging fruit, nvidia’s 10x margins for their enterprise products is that at least! Yet we will likely see more and more ML accelerator startups trying to take on that beast—in short, there is a lot of room for reducing costs of training large models especially in hardware, and most likely in software too, but I’m already way offtopic now to go into that). So maybe we could get MNT and ems/SIMs through hard work as plain old humans and solve a lot of humanity’s problems without needing more potentially risky projects like AGI, but would we? Most likely not, MNT is heavily underfunded, but research is limping along, yet narrow AI will likely accelerate it, however it’s also hard and needs a lot of experimental work, but in principle we could do it. What about SIMs? Even if we solve major industry bottlenecks (some of which are due to bad culture and improper incentives as well as the difficulty of the work), there’s still the bottleneck of actually having accurate neuron models and the sheer computational cost of accurate simulation, we would likely need few OOM better computing efficiency to do it or specialized hardware. So both of these could be decades away or longer at the current sluggish pace we’re going. ML on the other hand, is getting bigger and bigger each day, the demos are getting more impressive and with AlphaGo/MuZero we’ve seen domains that were thought to require real intelligence get eaten up, and even more importantly with GPT-3, we’ve seen that a large enough sequence predictor with attention can reach a similar type of thinking that humans do, to finally understand context and more, yes, people will complain that it seems to only do System1-type thinking, but it’s thinking in that domain is sometimes superhuman (sometimes it has incredibly subtle intuition that connects so many little things in just the right way, that I personally don’t believe I could do as well), and I do believe you could get it to do System2-type thinking with a few special tricks, and if those tricks seem obvious to me, an outsider to the field (they were obvious even back in 2020 summer), I’m sure they have to be obvious to those actually working with the actual models (in fact I’ve seen a few recent papers that were using some of those tricks, so it might be only a matter of time until the right combination of techniques is found). With GPT-3 the cat is certainly out of the bag, and I don’t think people will stop pushing in this direction. 7 years ago, I couldn’t have told you if we’d get ems/SIMs or MNT or AGI first, they all seemed possible in principle, yet either one seemed an undetermined time into the future (decades at least, possibly more), but nowadays it seems we at least have a path forward to AGI using mundane deep NNs that might not require that many deep insights or physical research (as MNT or SIMs do). As a sidenote, neural network research did also suffer a decade of delay due to a smear campaign by eminent scientists, such as you may see here: https://nitter.net/gdb/status/1495804412887023616#m “A Sociological Study of the Official History of the Perceptrons Controversy”, not unlike MNT research (which actually did not recover nearly as well, yet progress seems ongoing). I’m uncertain if not losing that decade would have helped or not, as current progress is in large part due to the hardware catching up to the needs for running artificial neural networks of sufficient size to do interesting things. Another note on SIM and AGIs—if projects like Neuralink or similar large-scale activity scanning become feasible and sufficiently comprehensive, we could try to train sort of pseudo-em’s by training either some sort of mundane artificial neural network or a more accurate differentiable spiking neuron model in a way that we approximate/guess the network using the limited (but still comprehensive) data captured by the Neuralink or some similar high bandwidth and high detail neural activity capture device, that is likely more tractable than fixing a whole industry lagging behind problems or figuring out very accurate neuron models. While I personally would prefer to take the route that makes as high fidelity copy as possible (to ensure subjective continuity in a way I trust), other people rushing toward (some form of) “Singularity” might choose this route—it was my first thought when Elon Musk was talking about Neuralink as if it would give us SIMs in due time and then the idea that you could do that clicked right away. I’ve later seen the idea appear in one of gwern’s writings (it was a collection of links on reddit, but he has so many posts it is proving hard to find), so it is certainly something other people are thinking about as well.
So what would I imagine this short-term AGI to look like when we get to properly building it? Most likely some massive transformer that is trained online with multimodal data (pictures, audio, text, possibly tacticle (likely will form motor and visual representions far easily) and more), possibly pretrained first to learn a good world model and then most likely trained with some form of RL to select its actions and teach it System-2 thinking, and if I may speculate, it could have a rich action space and a way of learning to easily adapt to new motor controls and environments quickly, and possibly something like Drexler’s QNRs(quasilinguistic neural representations) to provide more rich inputs and language: QNRs: Toward Language for Intelligent Machines. I’d also expect contexts much wider than 1024-2048 tokens of today, but maybe fewer layers, likely densely connected (not ensemble/MoE). It could be embodied in VR, or some sort of robotics experiment, even if robotics is lagging behind too much. Of course, it could just be simply not be embodied, but that might not be agentic enough, which I would suspect would be an impediment to learning theory of mind as well, System-2 thinking and more. Or such an AGI that might be just a latter version that humans might finally recognize or accept as AGI. It wouldn’t surprise me if we got something with System-2 deliberative thinking purely from a transformer trained in just the right way (quite a few ideas on how to do this come to mind, although some might be computationally too costly), with or without RL on top, although RL likely could be useful to nudge it in the right direction to prefer that type of explicit thinking. I would also suspect that such an agentic AGI (and even not agentic, but agent simulations by “tool” AI) might very well be recognizably conscious at least once it gets System-2 thinking, unless we try very hard to make them not seem as such. I would even bet that GPT-3′s emulated agents are sometimes to some degree conscious, however obviously fairly different from human experience, as at times, even back in 2020, I’ve managed to reliably trigger a multitude of reactions that described self-awareness, some qualia, without actually prompting directly, but rather nudging it in that direction with some simple tricks, and the results were often fascinating. Of course, whatever consciousness it may have it lacks subjective continuity or longterm memory (is not trained on its own outputs in a way that it could keep a self history; and it is not fully clear how well it would work, the problem of catastrophic forgetting exists, it might get less bad with size, same as sample efficiency increasing with size, but is it big enough? someone would have to find out), lacks a stable longterm self (also same) because it is not trained to keep one particular agent always around, that agent is not trained to “show its thoughts” (thinking to itself), even if that could be done (see google’s recent paper on how having their GPT-3 clone elaborate on its thoughts increases problem solving ability: Chain of Thought Prompting Elicits Reasoning in Large Language Models, and also, more importantly to us humans, its input/perception is just tokens, which are typically not grounded (but could be), it’s experience is likely very wildcard-like, not unlike someone with blindsight. These were my thoughts on it after a few months of first interacting with GPT-3, and while I kept my fringe opinion to myself, I’ve noticed others considering it as well, like this one google employee that worked with the model mentioned in the earlier paper: https://medium.com/@blaisea/do-large-language-models-understand-us-6f881d6d8e75 And even OpenAI’s main scaling guy Ilya Sutskever quipped this little meme after a year and a half of GPT-3: “it may be that today’s large neural networks are slightly conscious”, which that day prompted a lot of negative kneejerk reactions (yet I was surprised it took him that long to say what seemed a little obvious back in 2020). On the other hand, no matter what ML researchers actually think, there is certainly an incentive to say a loud NO to the consciousness question, lest you get activists like Thomas Metzinger after you (his philosophy of mind work on phenomenal self-models is likely the closest we have to a decent theory on the easy problem of consciousness, and also slightly relevant to curiosity, he ties prediction error to pain here: https://www.youtube.com/watch?v=5RxzgenxUfM ): https://nitter.net/naundob/status/1491703771134631938#m https://www.philosophie.fb05.uni-mainz.de/files/2021/02/Metzinger_Moratorium_JAIC_2021.pdf (it would appear Metzinger’s account actually disappeared or went private recently, he did retweet the post I linked). And while Metzinger may be morally right about it, I do not believe we can afford to just say no to making AIs, and if we do make some, conscious ones that are our peers would be preferable to ones that are our slaves or our superiors, especially some that have motivations that are compatible with ours. And if somehow the speculations of “optimists” like me are wrong about the dangers of AGI, at the very least the “lightcone” will be filled with agents with rich experiences and desires, and hopefully rich enough that said agents can do philosophy (and being computational/substrate independent, hopefully easily stumble upon a messed up ontology/”religion” like my own (which can be inferred from “I’m conscious” + “my body/mind can be represented as a computational process”, see the disclaimer at the start of my comment, Chamlers’ article and Bruno’s paper) and thus caring less about filling this world with their equivalent of paperclips and realizing how wide the multiverse really is (in some “Permutation City” or UDA sense as explained in the intro)).
If you do take anything from this long ramble, it is that I think that it’s possible for simple ways of shaping the rewards (such as a “curiosity” base drive, which implicitly includes boredom, desire for attention, and more, see earlier) that I believe have a decent chance of giving you AIs that have a good potential to be somewhat compatible with human interests and that while they may not be guaranteed safe (they could turn dangerous, not unlike regular humans), they might give us a serious chance, are not very hard to implement (compared to some proposed safety solutions), and certainly are better than your three flavors of death (options 1-3), better than horrific solutions like terrorism in various forms, even better than stagnating as a species and losing our potential or getting stuck on some stable dictatorships (or similar). Assuming we do get a chance, as current geopolitics situation (Russia’s current war and veiled nuclear threats, China’s longterm plans for Taiwan) is turning a little bit nasty, very slightly increasing the probability of WW3 or the quicker variant of many nukes flying would sadly curtail our potential for quite some time—no terrorism needed, meh—things like TSMC or Samsung being taken or damaged would sadly slow scaling progress by a number of years not to mention making everyone’s life a lot more annoying, while effective mutually assured destruction could do terrible damage to humanity as a whole. Narrow AI itself can cause plenty of direct suffering, for example, we may develop more drones and “killbots” given sufficient time, without having AGI, something that seems even more likely as more conflicts are brewing on the horizon, which is genuinely depressing given that we’re in 2022 and should have put this behind us.
I’ve seen some in the alignment community think such curiosity base drives or motivation treat this as something that could increase misalignment, and I suppose that is possible depending on what you consider “aligned”. One definition where the AI is your happily willing slave (does what the owner wants) would indeed make this less aligned as you might get an exchange like this “Can you design a car for me? No, I’m bored, I want to watch more QNR-Flix”), certainly not what a company would want for their business needs, yet as I argued before, those very business needs could turn out if not an existential threat for humanity, at the very least an awful inconvenience if scaled up. However, if you think of alignment as building something that is compatible with human values, and might have a chance of cooperating with us, I think it at least gives us a chance toward it (even if it might not be the single component needed for sure success), even if we might be introducing a “second species” into our environment.
The three options do seem much like shades of the same thing (various forms of death):
Might be preferable to 1 in a multiverse as you might simply might find yourself not experiencing those “branches”.
Seems really bad, however it’s not even clear that those humans would even be that conscious enough to sustain a self which might actually lower the “moral status” (or how much negative utility you want to ascribe to it), as that requires some degree of social interaction.
Is better than 3, but by how much, it’s not obvious if the selves of these blissed out humans wouldn’t dissolve in the same way as in 3 (think what solitary confinement does to a mind, or what would happen if you increased the learning rate to a neural network high enough?)
So to me these all seem to be various shades of death in various forms. I might prefer 2 because I do expect a multiverse.
I would propose that there’s some ways of shaping the rewards for a potential AGI that while not safe by the AI safety’s community’s standards, nor aligned in a “does what you want and nothing else” sense, it might give a much higher chance of positive outcomes than these examples despite still being a gamble: a curiosity drive, for example see OpenAI’s “Large Scale Curiosity” paper, I would also argue that GPT-n’s do end up with something similar by default, without fine tuning or nudging it in that direction with RL.
Why a curiosity drive (implemented as intrinsic motivation, based on prediction error)? I believe it’s likely that a lot of our complex values are due to this and a few other special things (empathy) as they interact with our baser biological drives and the environment. I also believe that having such a drive might be essential if we’d want future AGIs to be peers we cooperate with—and if by some chance it doesn’t work out and we go extinct, at the very least it might result in an AGI civilization with relatively rich experiences, so it wouldn’t be a total loss from some sort of utilitarian perspective.
My initial reply to this was rather long-winded, rambling and detailed, it had justifications for those beliefs, but it was deemed inappropriate for a front page post comment, so I’ve posted it to my shortform if you’d like to see my full answer (or should appear there if it gets past the spam filter).