We have to Upgrade
I want to bring up a point that I almost never hear talked about in AGI discussions. But to me feels like the only route for humans to have a good future. I’m putting this out for people that already largely share my view on the trajectory of AGI. If you don’t agree with the main premises but are interested, there are lots of other posts that go into why these might be true.
A) AGI seems inevitable.
B) Seems impossible that humans (as they are now) don’t lose control soon after AGI. All the arguments for us retaining control don’t seem to understand that AI isn’t just another tool. I haven’t seen any that grapple with what it really means for a machine to be intelligent.
C) It seems very hard that AGI will be aligned with what humans care about. These systems are just so alien. Maybe we can align it for a little bit but it will be unstable. Very hard to see how alignment is maintained with a thing that is way smarter than us and is evolving on its own.
D) Even if I’m wrong about B or C, humans are not intelligent/wise enough to deal with our current technology level, much less super powerful AI.
Let’s say we manage this incredibly difficult task of aligning or controlling AI to humans’ will. There are many amazing humans but also many many awful ones. The awful ones will continue to do awful things with way more leverage. This scenario seems pretty disastrous to me. We don’t want super powerful humans without an increase in wisdom.
To me the conclusion from A+B+C+D is: There is no good outcome (for us) without humans themselves also becoming super intelligent.
So I believe our goal should be to ensure humans are in control long enough to augment our mind with extra capability. (or upload but that seems further off) I’m not sure how this will work but I feel like the things that neuralink or science.xyz are doing, developing brain computer interfaces, are steps in that direction. We also need to figure out scalable technological ways to work on trauma/psychology/fulfilling needs/reducing fears. Humans will somehow have to connect with machines to become much wiser, much more intelligent, and much more enlightened. Maybe we can become something like the amygdala of the neo-neo-cortex.
There are two important timelines in competition here, length of time till we can upgrade, and length of time we can maintain control. We need to upgrade before we lose control. Unfortunately, in my view, on the current trajectory we will lose control before we are able to upgrade. I think we must work to make sure this isn’t the case.
Time Till Upgrade:
My current estimate is ~15 years. (very big error bars here)
Ways to shorten
AI that helps people do this science
AGI that is good at science and is aligned long enough to help us on this
More people doing this kind of research
More funding
More status to this kind of research
Maybe better interfaces to the current models will help in the short run and make people more productive thus speeding this development
Time Left With Control:
My current estimate is ~6 years
AGI ~3-4 years (less big error bars)
Loss of control 2-3 years after AGI (pretty big error bars)
Ways it could be longer
AI research slows down
Hope for safety
Hope we aren’t as close as it seems
Hope for a slowness to implement agentic behavior
Competing Agents
Alignment is pretty good and defense is easier than offense
?
In short, one of the most underrepresented ways to work on AI safety is to work on BCI.
The only way forward is through!
- The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda by 18 Dec 2023 20:35 UTC; 168 points) (
- Four visions of Transformative AI success by 17 Jan 2024 20:45 UTC; 112 points) (
- What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023? by 8 Jul 2023 11:42 UTC; 84 points) (
- My research agenda in agent foundations by 28 Jun 2023 18:00 UTC; 71 points) (
- There Should Be More Alignment-Driven Startups by 31 May 2024 2:05 UTC; 60 points) (
- Distillation of Neurotech and Alignment Workshop January 2023 by 22 May 2023 7:17 UTC; 51 points) (
- The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda by 18 Dec 2023 21:13 UTC; 21 points) (EA Forum;
- 3. Uploading by 23 Nov 2023 7:39 UTC; 21 points) (
- Protecting agent boundaries by 25 Jan 2024 4:13 UTC; 11 points) (
- Plausibility of cyborgism for protecting boundaries? by 27 Mar 2024 18:53 UTC; 10 points) (
- 11 Apr 2023 18:26 UTC; 6 points) 's comment on paulfchristiano’s Shortform by (
In general human cognitive enhancement could help AGI alignment if it were at scale before AGI, but the cognitive enhancements on offer seem like we probably won’t get very much out of them before AGI, and they absolutely don’t suffice to ‘keep up’ with AGI for more than a few weeks or months (as AI R&D efforts rapidly improve AI while human brains remain similar, rendering human-AI cyborg basically AI systems). So benefit from those channels, especially for something like BCI, has to add value mainly by making better initial decisions, like successfully aligning early AGI, rather than staying competitive. On the other hand, advanced AGI can quickly develop technologies like whole brain emulation (likely more potent than BCI by far).
BCI as a direct tool for alignment I don’t think makes much sense. Giving advanced AGI read-write access to human brains doesn’t seem like the thing to do with an AI that you don’t trust. On the other hand, an AGI that is trying to help you will have a great understanding of what you’re trying to communicate through speech. Bottlenecks look to me more like they lie in human thinking speeds, not communication bandwidth.
BCI might provide important mind-reading or motivational changes (e.g. US and PRC leaders being able to verify they were respectively telling the truth about an AGI treaty), but big cognitive enhancement through that route seems tricky in developed adult brains: much of the variation in human cognitive abilities goes through early brain development (e.g. genes expressed then).
Genetic engineering sorts of things would take decades to have an effect, so are only relevant for bets on long timelines for AI.
Human brain emulation is an alternative path to AGI, but suffers from the problem that understanding pieces of the brain (e.g. the algorithms of cortical columns) could enable neuroscience-inspired AGI before emulation of specific human minds. That one seems relatively promising as a thing to try to do with early AGI, and can go further than the others (as emulations could be gradually enhanced further into enormous superintelligent human-derived minds, and at least sped up and copied with more computing hardware).
Relevant:
Cyborgism (this is framed through the context of “alignment progress” but I think is generally relevant for humans staying in the loop / in-control)
Cyborg Periods: There will be multiple AI transitions (has an interesting frame wherein for each domain, there’s a period where humans are more powerful than AIs, a period where human + AI is more powerful than AI, and a period where pure AIs just dominate)
while it’s easy to agree with some abstract version of “upgrade” (as in try to channel AI capability gains into our ability to align them), the main bottleneck to physical upgrading is the speed difference between silicon and wet carbon: https://www.lesswrong.com/posts/Ccsx339LE9Jhoii9K/slow-motion-videos-as-ai-risk-intuition-pumps
Yeah to be clear I don’t think “upgrading” is easy. It might not even be possible in a way that makes it relevant. But I do think it offers some hope in an otherwise pretty bleak landscape.
I also think this approach deserves more consideration.
Also: since BCIs can generate easy-to-understand profits, and are legibly useful to many, we could harness market forces to shorten BCI timelines.
Ambitious BCI projects will likely be more shovel ready than many other alignment approaches—BCIs are plausibly amenable to Manhattan Project-level initiatives where we unleash significant human and financial capital. Maybe use Advanced Market Commitments to kickstart the innovators, etc.
For anybody interested, Tim Urban has a really well written post about Neuralink/BCIs: https://waitbutwhy.com/2017/04/neuralink.html
I am pleased to hear your viewpoint is so close to mine on this. Closer than most other people I’ve spoken with, although I have found a few others who are thinking along similar lines. There are a couple key differences in my estimates that change my priorities a bit.
Background: I started reading and thinking about AGI risk about 15 years ago. I then had much longer timelines for AGI (80-100 years) so I decided to study human intelligence enhancement. I studied neuroscience, with a focus on developing BCI and genetic intelligence enhancement, but a few years into my PhD work my AGI timelines got shorter and I also realized how many roadblocks the academic community was placing in the way of biomedical research focused on “enhancement-above-normal” rather than “curing a disease to return towards normal.” So I started thinking that substantial human intelligence enhancement was more like 50 − 75 years off, less like the 15-30 years off I had initially hoped.
So I left my neuroscience PhD and studied machine learning. Then my AGI timelines shortened from 20-30 years to more like 10- 20 years. I left mainstream ML engineering to study AI safety full time. I spent several months investigating the remaining roadblocks to AGI and the progress that various labs were making on these roadblocks, and my timelines shortened from 10-20 years to 1-5 years.
So, while I agree that we need a delay in loss-of-control-over-AGI in order to have time for serial research years into AGI alignment (including building more brain-like and interpretable ML models like Stephen Byrnes and Conjecture, and Astera:Obelisk have endorsed) and human intelligence enhancement, I think we need more of a delay than you postulate as your median estimate, more like 20 years of delay. And that we may need this delay to come into force sooner than your median estimate of 6 years. Additionally, I currently believe that the delay-period is easier to expand than the intelligence-enhancement-research time is to shrink, per unit of additional effort.
So my focus is currently on what seems the most urgent critical task: buying humanity time. Can we find social and technological solutions to extending the period of time before we lose control?
Can we build specialized narrow AI systems to help detect and shutdown nascent unauthorized AGIs? Can we develop methodology for safely studying AGI in confinement (sandboxing techniques like: improved network security and model training best practices, norms against openly sharing code or model weights, deliberate capability limitations of models in test environments, preventing self-modification or unauthorized message passing, more thorough safety evaluations, etc.)
Though my strategic emphasis differs, my conclusion is the same: The only way forward is through!
I have been shocked by the lack of effort put into social technology to lengthen timelines. As I see it one of the only chances we have is increasing the number of people (specifically normies, as that is the group with true scale) who understand and care about the risk arguments, but almost nobody seems to be trying to achieve this. Am I missing something?
No, I think outreach is a neglected cause area because AGI seemed implausible 10 or even 5 years ago. I think it is now much easier for people to imagine us getting there, and thus the time is ripe.
For the people disagreeing, I’m curious what part you’re disagreeing with.
I think there’s a delay in outreach for three reasons.
There’s substantial conflict within the community about the effect of doing that outreach. Trying to sound the alarm might just convince the whole world that AGI is imminent, and the first one there controls the world. That would accelerate progress dramatically. For some reason, normies do not seem to understand this. But the compelling logic would convince many if there were efforts to get everyone to think about it. This is why I’ve kept my mouth largely closed, and probably why many others have as well.
We as a community strongly believe it won’t work. We assume that the coordination problems are too large. But we don’t think about it a ton, for multiple reasons including 1 and 3 here. There are strong arguments that we should at least think about it more.
The types of people who tend to take abstract arguments, like AGI risk, seriously are typically not the types of people who want to take on massive social projects. There are many exceptions, like Rob Miles, but I think the averages make a difference in our approach as a community.
I do think the community is moving toward focusing more on this angle. And that we probably should.
While it’s an interesting idea, my credence for “upgrade in ~15 years” is essentially zero.
We don’t currently know how to make any significant improvement in human average intelligence, let alone a way to upgrade humans well beyond current peak intelligence. Even if we had actual prototypes right now that reliably worked for (say) significantly upgrading chimpanzees, it would likely take 15 years just to get through the large amount of adaptations, ethical reviews, and studies to get them adequately tested on humans. But we don’t, and there aren’t even plausible conceptual models yet that will lead to prototypes.
We’re still working our way through setting up the foundations for gathering the data that we need to start developing such models. That said, it is possible that we may get some radical breakthroughs in both technology and medical knowledge very quickly, as well as radical restructuring of risk approaches in medical establishments.
In such a case I could imagine such upgrades in as short as 30 years. However, I expect almost all worlds in which we get such radical changes so quickly are ones in which we already developed artificial superintelligence and it didn’t kill us.
Specifically, I heard we can’t increase intelligence or engineer a desirable trait via somatic gene editing, which limits a lot of genetic engineering’s usefulness.
There is a way to do ultrasound-mediated delivery of genes across the blood-brain barrier. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9137703/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546162/
Gene-editing short-sleeper genes into humans seems more tractable than intelligence/neuroplasticity/postsynaptic density genes (there is a very distinctly identifiable gene for this—https://lukepiette.posthaven.com/reducing-sleep-1 ). But given the stakes, it is totally worth trying gene-editing/gene therapy of intelligence genes too.
https://diyhpl.us/wiki/genetic-modifications/ / https://diyhpl.us/wiki/hplusroadmap/ have a lot.. (Bryan Bishop has more of a certain smg [correlates with knowing what all the right pointers are] than anyone else in the area does)
I have friends (Walter Patterson and Mac Davis) who run minicircle—a novel way of doing gene delivery—see https://www.rapamycin.news/t/minicircle-this-biohacking-company-is-using-a-crypto-city-to-test-controversial-gene-therapies-mit-tech-rev/5647 . Walter also has experience with ultrasound-mediated delivery techniques! They’re among the most open-minded and approachable people I ever know. The pool of people willing to try Minicircle overlaps a lot with the pool of people (like Liz Parrish) willing to try radical interventions seen as “too risky” by others, but we need these people.
(https://rle4.life/longevitygenedeliverysystem may be more promising for gene therapy)
See more here:
As for intelligence-enhancing genes—you should ask people at the ISIR conference (Stephen Hsu, James J. Lee, etc...) Even Emil O. W. Kierkegaard has some pointers. See https://emilkirkegaard.dk/en/2019/02/a-partial-test-of-duf1220-for-population-differences-in-intelligence/
For developing new tools to interrogate biological systems (including brain-based diagnostics to get readouts of differences in the brain after gene therapy intervention [you can start in mice first]), Sam Rodriques and Adam Marblestone (and Ed Boyden lab members) should be broadly useful. Maybe brain organoids can move quickly enough to be worth the shot even if the translational relevance is far-from-guaranteed—Herophilus is broadly doing tech development for this (though idk if for gene therapy of intelligence/short-sleeper genes).
Also related—https://forum.effectivealtruism.org/posts/hGY3eErGzEef7Ck64/mind-enhancement-cause-exploration
rewind.ai is a way to bring in cyborgism. There are many in the MIT Media Lab (Social Physics, Affective Computing, Pattie Maes) who have many of the right parts (along with Neurable/Neurosity/etc), but it is unknown if they are nimble enough to make the necessary thing happen
Possibly important/relevant names: Mina Fahmi, https://www.linkedin.com/in/shagun-maheshwari-75b8b7150/, Stephen Frey
NEAR-TERM, AI will produce superabundance and give us the chance to “find more unique ways to increase intelligence” without increasing cognitive overload (increasing the space of “microimprovements” that are Pareto-efficient). This includes reducing microplastic load, reducing pollution load, better optimizing sleep, better optimizing the nutrition of AI/alignment researchers (88% of Americans are metabolically unhealthy and there are many Pareto-efficient improvements like rapamycin, acarbose, canagliflozin,and plasmalogens that may not incur any tradeoffs), or incorporating more support structures for the hundreds of students who now want to drop out of school b/c school is not “modern” enough to help them adapt to the age of AI (GPT4 was the wakeup call to many GenZ’ers that they don’t want to be taking APs anymore or that “all of HS was useless”)… People complain of “sucking at programming because they didn’t learn it at age 11⁄12 now”—we can train young people to be BCI programmers at younger ages so that they won’t have the same complaint when they’re older. Eliezer Yudkowsky constantly wishing that he had the energy levels of a 25-year old is proof enough that many brain longevity improvements are Pareto-efficient (he is also proof that more unschooling is pro-”trustworthy AI”) [as is professors in their 30s saying “don’t count on your memory being as sharp as it was 10 years ago”]
Leopold Aschenbrenner says that we need WAY more AI alignment researchers, but the percent of people smart enough for AI alignment research at any level [*] is not high (pretty much EVERYONE I know doing alignment research has to be extremely smart—at minimum within the top few percentiles of human intelligence if not the top 0.5%). This leaves out many unless we pursue human enhancement.
[*] I strongly say at any level b/c it drastically goes down at the highest levels (eg at levels required to understand Vanessa Kosoy or MIRI-level work, then it’s probably top 0.1% → and even these levels may not be high enough to make a meaningful enough dent in AI risk).
Reducing any further global fluid intelligence decline with age (eg by reducing pollution/microplastic levels—we already see that Starcraft ability declines after age 24) is also necessary, esp b/c there is wide variation in the rate at which human brains decline, and the net effect of reducing aging rate on total integrated human compute may be larger now than ever before (b/c of human population size). Reducing intelligence decline w/age is also more tractable than “increasing intelligence”, especially b/c American brains shrink way faster than brains of an indigenous tribe. The strength of brain waves recordable by EEG decreases with aging (making it way harder for BCIs to discriminate intent) - further proof that reducing brain aging rate is the most important/tractable thing for “upgrading”.
More frictionless nootropics pipelines (that due to their low cognitive overheads, integrate well with better BCIs). The book “How to Change Your Mind” was written for psychedelics (which have strangely become more popular than nootropics), but it could have been used for nootropics instead. I’m friends with a nootropics startup founder who is trying decentralized ways of testing his nootropic combinations (the combos may have more potential than individual nootropics) and making it frictionless for people to integrate nootropics into their workflow. In an age of near-term AGI where old habits may guarantee extinction, we must change our openness/neuroplasticity to trying new things, and nootropics/injected peptides (like possibly p21 or cerebrolysin)/psychoplastogens could do a better job than psychedelics do at making people sustainably adapt* new habits into their daily pipelines (psychedelics massively disrupt one’s day and you cannot take them too often—you can, however, take psychoplastogens or nootropics daily). cf
and david olsen’s lab
There is a way of making ALL of this better frictionlessly integrate into people’s pipelines (and see how they retroactively modify rewind.ai data ⇒ presumably one could even calculate differences in processing speed/WM just from rewind.ai’ish data). I don’t know what the scaling curves for drug synthesis are, but there are paths through which they become cheaper much more rapidly (even if done in Roatan or Zuzalu), making mass A/B testing of psychoplastogens much easier than before.
[with all the data we collect from twitch streams and rewind.ai (on top of IRL neurosity/neurable data on brainwaves), it may already be possible to measure and sum up the tiny effects that **small practices in brain health improve]
[brainwave data is often used by cognitive control people—eg Randall O’Riley or David Badre and more enhanced cognitive control can do a lot even in the absence of intelligence markers. It’s too bad the labs only collect proprietory data that is never integrated into a global database, but it could be done if we coordinate with the psychologists who study it]. Almost no one has even done a proper study on the effects of nootropics/stimulants on cognitive control or brainwaves, and this should at minimum be done by any institution to enhance human cognition, which could presumably attract loads of funding. Even Eliezer Yudkowsky has now suggested intelligence enhancement in humans as a strategy, especially if paired with an actual slowdown.
(I know biohacker circles who have experience with injected peptides—that’s how I injected SS-31 into myself for the first time. I don’t know the effect size of this on neuroplasticity, but if it can be done with minimal overhead [esp as AI drives down the cost of labor], it’s worth trying)
Comprehensive metabolomic/proteomic profiling is also becoming way cheaper and can be done with minimal cognitive overhead. See SomaLogic and Snyder lab for more (there are labs that find results of sleep deprivation on SomaLogic proteomics—one can and should extend this to general patterns of “enhanced/deprived cognition”—and use this to predict which people lose out “less” from fewer hours of sleep) ⇒ this could be paired with brain-wave data. Some quantified-self’ers have many ideas in the right direction, but tbh they still aren’t the most curious people, and I’d probably be the best one amongst them if not for my various hangups (oh wait, this is how I could apply for funding, obvs I also need to start taking focalin after a long hiatus). Even Mike Lustgarten has hangups over things I don’t have hangups over.
[also biometrics X video games [or tutoring] ⇒ may even enable a “freemium” model for games]
Jhourney is a brain-inspired way to “shortcut” “revelations” or “Romeo-Stevens-like” states in people and is developed by very legit neuroscientists (it’s possible that nootropics could be integrated into this already extensive brainwave data)
I know one person applying BCI technology to study gamers—his name is Alex Milenkovic and he is SUPER approachable (see my YouTube channel). Nootropics can easily be integrated into this pipeline to see how they affect EEG brainwaves)
Alignment means minimal loss in capturing the intent of human preferences (including memory and context loss, and loss in translation if someone mentors/tutors a single person but not other people who could benefit from the same training), AND loss in taste (taste is better-allocating attention/transformer layers better)
https://foresight.org/whole-brain-emulation-workshop-2023/
[FYI there is nothing to prevent us from cutting open the skull and enlarging the size of the brain (there are neural replacement/repair startups though it is unknown if the technology is mature yet)]
Milan Cvitkovic has also just written another article on the same lines: https://milan.cvitkovic.net/writing/neurotechnology_is_critical_for_ai_alignment/
https://cell.substack.com/p/darpa-neurotech
[perhaps some solutions to the biohackers/neurotech/law coordination problem will be discussed at https://zuzalu.super.site/about !]
The goal of transhumanism is to transcend our genetic limitations—to enable a greater pool of people to contribute to science/innovation than the genetically privileged. Maybe only 1-4% of the population is capable of doing cutting-edge scientific (or alignment) work, but we can massively increase this number via brain enhancement (finding ways to enhance the brains of the 50th percentile to be at the 99th percentile [though better AI-driven tutoring may also help] - and this may be easier than enhancing the 99th percentile brains, though the latter may be more important for the most global kind of risk). The pool of innovations adjacent to GPT4 will cause major disruptions to how we learn/prove ourselves within 1-2 years—originality is the only thing that matters, so break free from our old patterns and move towards what we know what the high-agency “ideal protagonist” (w/zero scarcity mindset) would value.
[neurofeedback is expensive, but I think there is a viable case study where I can ask for funding related for this and where I stream enough of myself to make others want to adopt it at an accelerated timetable]. I think some roughly have intuition about this, but I think this is where much of my unique value lies.
[maybe no one here will appreciate me yet, but I hope GPT5 will. There are many mixed order interactions (depending on 3 or more variables) with extremely large 3rd-or-higher-order coefficients that have not been realized yet simply b/c software/AI has not been powerful enough to implement higher-order interactions that depend on 3 or more variables or certain time-lagged regressions/dependencies that would previously have been forgotten… Gamma becomes more important in densely connected systems]
A workshop was held on organoid intelligence just a few weeks ago—https://www.frontiersin.org/journals/science/articles/10.3389/fsci.2023.1017235
https://hub.jhu.edu/2023/02/28/organoid-intelligence-biocomputers/
(from https://www.nature.com/articles/s41467-021-22741-9 ) / https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6363383/
https://research.vu.nl/en/persons/natalia-goriounova/publications/ - her research has the most neurobiology X intelligence X dendritic complexity in it (way more biological than ISIR research)
https://alleninstitute.org/news/living-brain-donors-are-helping-us-better-understand-our-own-neurons-including-those-potentially-linked-to-alzheimers-disease/
https://research.vu.nl/en/persons/djai-heyer
For the genetic modifications like short sleeper or increasing intelligence, how many upgrades are targeting the somatic cells, and how many upgrades target germline cells?
If any genetic modification or upgrade applies to the somatic cells, how fast does it take effect, or when should you start expecting the genetic modification or upgrade to work?
How strong are the genetic modifications or upgrades can people get for various traits?
The thing way smarter than us maintains it. If it manages to notkilleveryone initially, it’s motivated to preserve that state of affairs, as a matter of revealed preference. Alignment is not control.
@Jed McCaleb agree. I’m hosting this workshop with Anders Sandberg (co-author of the original 2008 Whole Brain Emulation Roadmap) in Oxford in May on this topic: https://foresight.org/whole-brain-emulation-workshop-2023. Would appreciate people’s ideas for people/opportunities/risks to consider.
We might be able to use BCIs to enhance our intelligence, but it’s not entirely clear to me how that would work. What parts of the brain does it connect to?
What’s easier for me to imagine is how BCIs would allow an AGI to take control of human bodies (and bodies of other animals). Robotics isn’t nearly as close to outperforming human bodies as AI is to outperforming human minds, so controlling human bodies by replacing the brain with a BCI that connects to all the incoming and outgoing nerves might be a great way for an AGI to navigate the physical world.
It’s easiest to challenge your assumptions of :
loss of human control is inevitable
Upon control loss, humans will placidly wait to die instead of immediately resorting to unlimited violence, defeating the AI unless it has managed to acquire enough hard power to not be destroyed.
1 : see open agency or just look at the stateless myopia in use right now. A superintelligent stateless and myopic ASI is likely completely controllable as it lacks the information needed to break free
See nuclear weapons.
I don’t really know but my guess is that all these schemes would have to involve high bandwidth (whether reading or writing or both), and bandwidth is very hard to achieve. The electrodes are unwieldy (IIUC it’s a notable accomplishment of neuralink to get to ~1000), and we’d want, I don’t know, at least 3 more orders of magnitude to see really interesting uses?
https://tsvibt.blogspot.com/2022/11/prosthetic-connectivity.html
I don’t see why communicating with an AI through a BCI is necessarily better than through a keyboard+screen. Just because a BCI is more ergonomic and the AI might feel more like “a part of you”, it won’t magically be better aligned.
In fact the BCI option seems way scarier to me. An AI that can read my thoughts at any time and stimulate random neurons in my brain at will? No, thanks. This scenario just feels like you are handing it the “breaking out of the box” option on a silver platter.
The idea is that the BCI is added slowly and you integrate the new neurons into you in a continuous identity preserving way., the AI thinks your thoughts.
I strongly agree that we should upgrade in this sense.
I also think that a lot of this work might be initially doable with high-end non-invasive BCIs (which is also somewhat less risky, but can also be done much faster). High-end EEG seems already be used successfully to decode the images the person is looking at: https://www.biorxiv.org/content/10.1101/787101v3 And the computer can adjust its audio-visual output to aim for particular EEG changes in real-time (so fairly tight coupling is possible, which carries with it both opportunities and risks).
I have a possible post sitting in the Drafts, and it says the following among other things:
Speaking from the experimental viewpoint, we should ponder feasible experiments in creating hybrid consciousness between tightly coupled biological entities and electronic circuits. Such experiments might start shedding some empirical light into the capacity of electronic circuits to support subjective experience and might constitute initial steps towards acquiring the ability to eventually be able “to look inside the other entity’s subjective realm”.
[ ]
Having Neuralink-like BCIs is not a hard requirement in this sense. A sufficiently tight coupling can probably be achieved by taking EEG and polygraph-like signals from the biological entity and giving appropriately sculpted audio-visual signals from the electronic entity. I think it’s highly likely that such non-invasive coupling will be sufficient for initial experiments. Tight closed loops of this kind represent formidable safety issues even with non-invasive connectivity, and since this line of research assumes that human volunteers will try this at some point, while observing the resulting subjective experiences and reporting on them, ethical and safety considerations will have to be dealt with.
Nevertheless, assuming that one finds a way for such experiments to go ahead, one can try various things. E.g. one can train a variety of differently architected electronic circuits to approximate the same input-output function, and see if the observed subjective experiences differ substantially depending on the architecture of the electronic circuit in question. A positive answer would be the first step to figuring out how activity of an electronic circuit can be directly associated with subjective experiences.
If people start organizing for this kind of work, I’d love to collaborate.
You still run into the alignment problem of ensuring that the upgraded version of you aligns with your values, or some extension of them. If my uploaded transhuman self decides to turn the world into paperclips that’s just as bad as if a non-human AGI does.
I believe something similar to this, though it may be different enough to say in my own words.
I think that the most likely way to get the big, powerful, steering-the-future-lightcone AI system to steer according to our values is for it to directly access our values, as encoded in our brains.
This has to involve “figuring out” neuroscience one way or the other, whether that’s the AI learning to read off neurons, or from brain emulation/uploading.
I agree with others here that this brain tech has a far longer timeline than AGI.
So my hope is that we figure out how to create controlled AGIs and/or slow down capabilities advances, and then use these tools (plus the normal advances of science) to figure out uploading.
Related: https://milan.cvitkovic.net/writing/neurotechnology_is_critical_for_ai_alignment/
While I’ve upvoted this because I think upgrading humans is a genuinely good cause, I disagree with this premise, for a lot of reasons.
I also think this is the reason why I disagree with so much of LW on this topic.
The avantgarde way to get things done is to address requests and suggestions, not to humanity, but to the AIs themselves, i.e. write “An Appeal to AI Superintelligence: Reasons to Upgrade Humanity”.
This creative solution around the alignment problem occurred to me too a long while ago, and probably a lot of other people as well. I can’t say I put any stock in it.
The human brain is even more complicated than neutral networks, and if AI’s have invented a way to add even just 10 IQ points to the brains of the alignment researchers, then we’re already dead.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I suspect that humans will turn out to be relatively simple to encode—quite small amounts of low-resolution memory that we draw on, with detailed understanding maps—smaller than LLMs that we’re creating. Added to which there is an array of motivation factors that will be quite universal but of varying levels of intensity in different dimensions for each individual.
If that take on things is correct then it may be that emulating a human by training a skeleton AI using constant video streaming etc over a 10-20 year period (about how long neurons last before replacement) to optimally better predict behaviour of the human being modelled will eventually arrive at an AI with almost exactly the same beliefs and behaviours as the human being emulated.
Without physically carving up brains and attempting to transcribe synaptic weightings etc that might prove the most viable means of effective up-loading and creation of highly aligned AI with human like values. And perhaps would create something closer to being our true children-of-the-mind
For AGI alignment; seems like there will at minimum need to be a perhaps multiple blind & independent hierarchies of increasingly smart AIs continually checking and assuring that next level up AIs are maintaining alignment with active monitoring of activities, because as AIs get smarter their ability to fool monitoring systems will likely grow as the relative gulf between monitored and monitoring intelligence grows.
I think a wide array of AIs is a bad idea. If there is a non-zero chance that an AI goes ‘murder clippy’ and ends humans, then that probability is additive—more independent AIs = higher chance of doom.
That’s the premise of Greg Egan’s “Jewel” stories. I think it’s wrong. A person who never saw a spider will still get scared when seeing one for the first time, because humans are hardwired for that. A person who has a specific memory and doesn’t mention it to anyone for many years, probably doesn’t give enough information through their behavior to infer the memory in detail. And the extreme example of why input/output is not enough to infer everything about inner life: imagine a human in a box, which has no input/output at all, but plenty of inner life. I think we all have lots of inner degrees of freedom, that can’t be fully determined even from a full record of our behavior over a long time.
I agree with all of the premises. This timeline is short even for AGI safety people, but it also seems quite plausible.
I think there are people thinking about aligning true intelligence (that is, agentic, continually learning, and therefore self-teaching and probably self-improving in architecture). Unfortunately, that doesn’t change the logic, because those people tend to have very pessimistic views on our odds of aligning such a system. I put Nate Soares, Eliezer Yudkowsky, and others in that camp.
There is a possible solution: build AGI that is human-like. The better humans among us are safely and stably aligned. Many individuals would be safe stewards of humanity’s future, even if they changed and enhanced themselves along the road.
Creating a fully humanlike AGI is an unlikely solution, since the timeline for that would be even longer than the timelines for effective upgrades by AI enhancement through BCI.
But there is already work on roughly human-like AGI. I put DeepMind’s focus on deep RL agents in this category. And there are proposed solutions that would produce at least short-term, if not long-term, alignment of that type of system. Steve Byrnes has proposed one such solution, and I’ve proposed a similar one.
Even partial success at this type of solution might keep loosely brianlike AGI aligned long enough for other solutions to be brought into play.
One of arguments/intuition pumps in Chinese room experiment is to make human inside the room to remember room content instead.
Excellent example that makes me notice my bias: I feel that only one person can occupy my head. Same bias makes people believe augmentation is safe solution.
Yes fully agree. I don’t see how things can work long term otherwise. One way where this happens is the BCI is thought of some kind of Pivotal Act, weak one perhaps. There’s also the (counterfactual) contract element to it. As soon as an AGI is self aware it agrees that as we upgrade it, is agrees to a contract. That is while we are smarter than it, we upgrade it, when it becomes smarter than us, it agrees to upgrade us.
Given the unpredictable emergent behavior in researchers’ AI models, we will likely see emergent AI behavior with real-world consequences. We can limit these consequences by limiting the potential vectors of malignant behavior, the primary being autonomous lethal weapons. See my post and underlying comments for further details:
https://www.lesswrong.com/posts/b2d3yBzzik4hajGni/limit-intelligent-weapons