Matthew Barnett

Karma: 9,258

Someone who is interested in learning and doing good.

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Matthew Barnett 18 May 2024 0:41 UTC
1 point
−1
in reply to: Seth Herd’s comment on: Instruction-following AGI is easier and more likely than value aligned AGI
No arbitrarily powerful AI could succeed at taking over the world
This is closest to what I am saying. The current world appears to be in a state of inter-agent competition. Even as technology has gotten more advanced, and as agents have gotten powerful over time, no single unified agent has been able to obtain control over everything and win the entire pie, defeating all the other agents. I think we should expect this state of affairs to continue even as AGI gets invented and technology continues to get more powerful.
(One plausible exception to the idea that “no single agent has ever won the competition over the world” is the human species itself, which dominates over other animal species. But I don’t think the human species is well-described as a unified agent, and I think our power comes mostly from accumulated technological abilities, rather than raw intelligence by itself. This distinction is important because the effects of technological innovation generally diffuse across society rather than giving highly concentrated powers to the people who invent stuff. This generally makes the situation with humans vs. animals disanalogous to a hypothetical AGI foom in several important ways.)
Separately, I also think that even if an AGI agent could violently take over the world, it would likely not be rational for it to try, due to the fact that compromising with the rest of the world would be a less risky and more efficient way of achieving its goals. I’ve written about these ideas in a shortform thread here.

Matthew Barnett 17 May 2024 21:34 UTC
6 points
2
in reply to: Seth Herd’s comment on: Instruction-following AGI is easier and more likely than value aligned AGI
It sounds like you’re thinking mostly of AI and not AGI that can self-improve at some point
I think you can simply have an economy of arbitrarily powerful AGI services, some of which contribute to R&D in a way that feeds into the entire development process recursively. There’s nothing here about my picture that rejects general intelligence, or R&D feedback loops.
My guess is that the actual disagreement here is that you think that at some point a unified AGI will foom and take over the world, becoming a centralized authority that is able to exert its will on everything else without constraint. I don’t think that’s likely to happen. Instead, I think we’ll see inter-agent competition and decentralization indefinitely (albeit with increasing economies to scale, prompting larger bureaucratic organizations, in the age of AGI).
Here’s something I wrote that seems vaguely relevant, and might give you a sense as to what I’m imagining,
Given that we are already seeing market forces shaping the values of existing commercialized AIs, it is confusing to me why an EA would assume this fact will at some point no longer be true. To explain this, my best guess is that many EAs have roughly the following model of AI development:
1. There is “narrow AI”, which will be commercialized, and its values will be determined by market forces, regulation, and to a limited degree, the values of AI developers. In this category we find GPT-4 from OpenAI, Gemini from Google, and presumably at least a few future iterations of these products.
2. Then there is “general AI”, which will at some point arrive, and is qualitatively different from narrow AI. Its values will be determined almost solely by the intentions of the first team to develop AGI, assuming they solve the technical problems of value alignment.
My advice is that we should probably just drop the second step, and think of future AI as simply continuing from the first step indefinitely, albeit with AIs becoming incrementally more general and more capable over time.

Matthew Barnett 17 May 2024 19:40 UTC
LW: 8 AF: 4
2
AF
in reply to: Charlie Steiner’s comment on: Instruction-following AGI is easier and more likely than value aligned AGI
Yes, but I don’t consider this outcome very pessimistic because this is already what the current world looks like. How commonly do businesses work for the common good of all humanity, rather than for the sake of their shareholders? The world is not a utopia, but I guess that’s something I’ve already gotten used to.

Matthew Barnett 16 May 2024 5:05 UTC
2 points
0
in reply to: ryan_greenblatt’s comment on: “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
I think we probably disagree substantially on the difficulty of alignment and the relationship between “resources invested in alignment technology” and “what fraction aligned those AIs are” (by fraction aligned, I mean what fraction of resources they take as a cut).
That’s plausible. If you think that we can likely solve the problem of ensuring that our AIs stay perfectly obedient and aligned to our wishes perpetually, then you are indeed more optimistic than I am. Ironically, by virtue of my pessimism, I’m more happy to roll the dice and hasten the arrival of imperfect AI, because I don’t think it’s worth trying very hard and waiting a long time to try to come up with a perfect solution that likely doesn’t exist.
I also think that something like a basin of corrigibility is plausible and maybe important: if you have mostly aligned AIs, you can use such AIs to further improve alignment, potentially rapidly.
I mostly see corrigible AI as a short-term solution (although a lot depends on how you define this term). I thought the idea of a corrigible AI is that you’re trying to build something that isn’t itself independent and agentic, but will help you in your goals regardless. In this sense, GPT-4 is corrigible, because it’s not an independent entity that tries to pursue long-term goals, but it will try to help you.
But purely corrigible AIs seem pretty obviously uncompetitive with more agentic AIs in the long-run, for almost any large-scale goal that you have in mind. Ideally, you eventually want to hire something that doesn’t require much oversight and operates relatively independently from you. It’s a bit like how, when hiring an employee, at first you want to teach them everything you can and monitor their work, but eventually, you want them to take charge and run things themselves as best they can, without much oversight.
And I’m not convinced you could use corrigible AIs to help you come up with the perfect solution to AI alignment, as I’m not convinced that something like that exists. So, ultimately I think we’re probably just going to deploy autonomous slightly misaligned AI agents (and again, I’m pretty happy to do that, because I don’t think it would be catastrophic except maybe over the very long-run).
I think various governments will find it unacceptable to construct massively powerful agents extremely quickly which aren’t under the control of their citizens or leaders.
I think people will justifiably freak out if AIs clearly have long run preferences and are powerful and this isn’t currently how people are thinking about the situation.
For what it’s worth, I’m not sure which part of my scenario you are referring to here, because these are both statements I agree with.
In fact, this consideration is a major part of my general aversion to pushing for an AI pause, because, as you say, governments will already be quite skeptical of quickly deploying massively powerful agents that we can’t fully control. By default, I think people will probably freak out and try to slow down advanced AI, even without any intervention from current effective altruists and rationalists. By contrast, I’m a lot more ready to unroll the autonomous AI agents that we can’t fully control compared to the median person, simply because I see a lot of value in hastening the arrival of such agents (i.e., I don’t find that outcome as scary as most other people seem to imagine.)
At the same time, I don’t think people will pause forever. I expect people to go more slowly than what I’d prefer, but I don’t expect people to pause AI for centuries either. And in due course, so long as at least some non-negligible misalignment “slips through the cracks”, then AIs will become more and more independent (both behaviorally and legally), their values will slowly drift, and humans will gradually lose control—not overnight, or all at once, but eventually.

Matthew Barnett 16 May 2024 4:41 UTC
2 points
0
in reply to: ryan_greenblatt’s comment on: “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Naively, it seems like it should undercut their wages to subsistence levels (just paying for the compute they run on). Even putting aside the potential for alignment, it seems like there will general be a strong pressure toward AIs operating at subsistence given low costs of copying.
I largely agree. However, I’m having trouble seeing how this idea challenges what I am trying to say. I agree that people will try to undercut unaligned AIs by making new AIs that do more of what they want instead. However, unless all the new AIs perfectly share the humans’ values, you just get the same issue as before, but perhaps slightly less severe (i.e., the new AIs will gradually drift away from humans too).
I think what’s crucial here is that I think perfect alignment is very likely unattainable. If that’s true, then we’ll get some form of “value drift” in almost any realistic scenario. Over long periods, the world will start to look alien and inhuman. Here, the difficulty of alignment mostly sets how quickly this drift will occur, rather than determining whether the drift occurs at all.

Matthew Barnett 16 May 2024 3:40 UTC
4 points
0
in reply to: ryan_greenblatt’s comment on: “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
A thing I always feel like I’m missing in your stories of how the future goes is “if it is obvious that the AIs are exerting substantial influence and acquiring money/power, why don’t people train competitor AIs which don’t take a cut?”
People could try to do that. In fact, I expect them to do that, at first. However, people generally don’t have unlimited patience, and they aren’t perfectionists. If people don’t think that a perfectly robustly aligned AI is attainable (and I strongly doubt this type of entity is attainable), then they may be happy to compromise by adopting imperfect (and slightly power-seeking) AI as an alternative. Eventually people will think we’ve done “enough” alignment work, even if it doesn’t guarantee full control over everything the AIs ever do, and simply deploy the AIs that we can actually build.
This story makes sense to me because I think even imperfect AIs will be a great deal for humanity. In my story, the loss of control will be gradual enough that probably most people will tolerate it, given the massive near-term benefits of quick AI adoption. To the extent people don’t want things to change quickly, they can (and probably will) pass regulations to slow things down. But I don’t expect people to support total stasis. It’s more likely that people will permit some continuous loss of control, implicitly, in exchange for hastening the upside benefits of adopting AI.
Even a very gradual loss of control, continuously compounded, eventually means that humans won’t fully be in charge anymore.
In the medium to long-term, when AIs become legal persons, “replacing them” won’t be an option—as that would violate their rights. And creating a new AI to compete with them wouldn’t eliminate them entirely. It would just reduce their power somewhat by undercutting their wages or bargaining power.
Most of my “doom” scenarios are largely about what happens long after AIs have established a footing in the legal and social sphere, rather than the initial transition period when we’re first starting to automate labor. When AIs have established themselves as autonomous entities in their own right, they can push the world in directions that biological humans don’t like, for much the same reasons that young people can currently push the world in directions that old people don’t like.

Matthew Barnett 16 May 2024 2:48 UTC
4 points
0
in reply to: jacob_cannell’s comment on: “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Everything seems to be going great, the AI systems vasten, growth accelerates, etc, but there is mysteriously little progress in uploading or life extension, the decline in fertility accelerates, and in a few decades most of the economy and wealth is controlled entirely by de novo AI; bio humans are left behind and marginalized.
I agree with the first part of your AI doom scenario (the part about us adopting AI technologies broadly and incrementally), but this part of the picture seems unrealistic to me. When AIs start to influence culture, it probably won’t be a big conspiracy. It won’t really be “mysterious” if things start trending away from what most humans want. It will likely just look like how cultural drift generally always looks: scary because it’s out of your individual control, but nonetheless largely decentralized, transparent, and driven by pretty banal motives.
AIs probably won’t be “out to get us”, even if they’re unaligned. For example, I don’t anticipate them blocking funding for uploading and life extension, although maybe that could happen. I think human influence could simply decline in relative terms even without these dramatic components to the story. We’ll simply become “old” and obsolete, and our power will wane as AIs becomes increasingly autonomous, legally independent, and more adapted to the modern environment than we are.
Staying in permanent control of the future seems like a long, hard battle. And it’s not clear to me that this is a battle we should even try to fight in the long run. Gradually, humans may eventually lose control—not because of a sudden coup or because of coordinated scheming against the human species—but simply because humans won’t be the only relevant minds in the world anymore.

Matthew Barnett 16 May 2024 0:58 UTC
LW: 48 AF: 18
33
AF
on: Instruction-following AGI is easier and more likely than value aligned AGI
I think the main reason why we won’t align AGIs to some abstract conception of “human values” is because users won’t want to rent or purchase AI services that are aligned to such a broad, altruistic target. Imagine a version of GPT-4 that, instead of helping you, used its time and compute resources to do whatever was optimal for humanity as a whole. Even if that were a great thing for GPT-4 to do from a moral perspective, most users aren’t looking for charity when they sign up for ChatGPT, and they wouldn’t be interested in signing up for such a service. They’re just looking for an AI that helps them do whatever they personally want.
In the future I expect this fact will remain true. Broadly speaking, people will spend their resources on AI services to achieve their own goals, not the goals of humanity-as-a-whole. This will likely look a lot more like “an economy of AIs who (primarily) serve humans” rather than “a monolithic AGI that does stuff for the world (for good or ill)”. The first picture just seems like a default extrapolation of current trends. The second picture, by contrast, seems like a naive conception of the future that (perhaps uncharitably), the LessWrong community generally seems way too anchored on, for historical reasons.

Matthew Barnett 14 May 2024 1:59 UTC
4 points
2
in reply to: RobertM’s comment on: RobertM’s Shortform
I’m not sure if you’d categorize this under “scaling actually hitting a wall” but the main possibility that feels relevant in my mind is that progress simply is incremental in this case, as a fact about the world, rather than being a strategic choice on behalf of OpenAI. When underlying progress is itself incremental, it makes sense to release frequent small updates. This is common in the software industry, and would not at all be surprising if what’s often true for most software development holds for OpenAI as well.
(Though I also expect GPT-5 to be medium-sized jump, once it comes out.)

Matthew Barnett 10 May 2024 2:39 UTC
4 points
2
in reply to: Adam Scholl’s comment on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
Yes, I expect AI labs will run extensive safety tests in the future on their systems before deployment. Mostly this is because I think people will care a lot more about safety as the systems get more powerful, especially as they become more economically significant and the government starts regulating the technology. I think regulatory forces will likely be quite strong at the moment AIs are becoming slightly smarter than humans. Intuitively I anticipate the 5 FTE-year threshold to be well-exceeded before such a model release.

Matthew Barnett 10 May 2024 2:25 UTC
2 points
−2
in reply to: habryka’s comment on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
Putting aside the question of whether AIs would depend on humans for physical support for now, I also doubt that these initial slightly-smarter-than-human AIs could actually pull off an attack that kills >90% of humans. Can you sketch a plausible story here for how that could happen, under the assumption that we don’t have general-purpose robots at the same time?

Matthew Barnett 10 May 2024 2:11 UTC
4 points
3
in reply to: habryka’s comment on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
I’m not saying AIs won’t have a large impact on the world when they first start to slightly exceed human intelligence (indeed, I expect AIs-in-general will be automating lots of labor at this point in time). I’m just saying these first slightly-smarter-than-human AIs won’t pose a catastrophic risk to humanity in a serious sense (at least in an x-risk sense, if not a more ordinary catastrophic sense too, including for reasons of rational self-restraint).
Maybe some future slightly-smarter-than-human AIs can convince a human to create a virus, or something, but even if that’s the case, I don’t think it would make a lot of sense for a rational AI to do that given that (1) the virus likely won’t kill 100% of humans, (2) the AIs will depend on humans to maintain the physical infrastructure supporting the AIs, and (3) if they’re caught, they’re vulnerable to shutdown since they would lose in any physical competition.
My sense is that people who are skeptical of my claim here will generally point to a few theses that I think are quite weak, such as:
1. Maybe humans can be easily manipulated on a large scale by slightly-smarter-than-human AIs
2. Maybe it’ll be mere weeks or months between the first slightly-smarter-than-human AI and a radically superintelligent AI, making this whole discussion moot
3. Maybe slightly smarter-than-human AIs will be able to quickly invent destructive nanotech despite not being radically superintelligent
That said, I agree there could be some bugs in the future that cause localized disasters if these AIs are tasked with automating large-scale projects, and they end up going off the rails for some reason. I was imagining a lower bar for “safe” than “can’t do any large-scale damage at all to human well-being”.

Matthew Barnett 10 May 2024 1:26 UTC
17 points
5
on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
Here’s something that I suspect a lot of people are skeptical of right now but that I expect will become increasingly apparent over time (with >50% credence): slightly smarter-than-human software AIs will initially be relatively safe and highly controllable by virtue of not having a physical body and not having any legal rights.

In other words, “we will be able to unplug the first slightly smarter-than-human-AIs if they go rogue”, and this will actually be a strategically relevant fact, because it implies that we’ll be able to run extensive experimental tests on highly smart AIs without worrying too much about whether they’ll strike back in some catastrophic way.

Of course, at some point, we’ll eventually make sufficient progress in robotics that we can’t rely on this safety guarantee, but I currently imagine at least a few years will pass between the first slightly-smarter-than-human software AIs, and mass manufactured highly dexterous and competent robots.

(Although I also think there won’t be a clear moment in which the first slightly-smarter-than-human AIs will be developed, as AIs will be imbalanced in their capabilities compared to humans.)

Matthew Barnett 3 May 2024 22:10 UTC
LW: 6 AF: 3
0
AF
in reply to: Buck’s comment on: Buck’s Shortform
Early: That comes from AIs that are just powerful enough to be extremely useful and dangerous-by-default (i.e. these AIs aren’t wildly superhuman).
Can you be more clearer this point? To operationalize this, I propose the following question: what is the fraction of world GDP you expect will be attributable to AI at the time we have these risky AIs that you are interested in?
For example, are you worried about AIs that will arise when AI is 1-10% of the economy, or more like 50%? 90%?

Matthew Barnett 30 Apr 2024 19:07 UTC
2 points
0
in reply to: trust’s comment on: My guide to lifelogging
I’m happy to know that more people are trying out lifelogging.
Should I show him that other people do this and try to convince him that I’m not mentally ill?
While you could try showing him that others engage in this hobby, I’m not sure it would be effective in changing his perspective. I think a stronger argument is that lifelogging is harmless, as long as you’re not recording people without their consent. The only real considerations are the convenience and storage costs, which you can independently manage without independent support. Data storage is cheap these days, and easily affordable to someone with a part-time job.

Matthew Barnett 27 Apr 2024 21:53 UTC
2 points
0
in reply to: Linch’s comment on: Losing Faith In Contrarianism

But if the message that people received was “medicine doesn’t work” (and it appears that many people did), then Scott’s writings should be an useful update, independent of whether Hanson’s-writings-as-intended was actually trying to deliver that message.

The statement I was replying to was: “I’d bet at upwards of 9 to 1 odds that Hanson is wrong about it.”

If one is incorrect about what Hanson believes about medicine, then that fact is relevant to whether you should make such a bet (or more generally whether you should have such a strong belief about him being “wrong”). This is independent of whatever message people received from reading Hanson.

Matthew Barnett 27 Apr 2024 3:02 UTC
4 points
2
in reply to: Amalthea’s comment on: AI Regulation is Unsafe
non-consensually killing vast amounts of people and their children for some chance of improving one’s own longevity.
I think this misrepresents the scenario since AGI presumably won’t just improve my own longevity: it will presumably improve most people’s longevity (assuming it does that at all), in addition to all the other benefits that AGI would provide the world. Also, both potential decisions are “unilateral”: if some group forcibly stops AGI development, they’re causing everyone else to non-consensually die from old age, by assumption.
I understand you have the intuition that there’s an important asymmetry here. However, even if that’s true, I think it’s important to strive to be accurate when describing the moral choice here.

Matthew Barnett 26 Apr 2024 23:34 UTC
4 points
0
in reply to: Daniel Kokotajlo’s comment on: AI Regulation is Unsafe
And quantitatively I think it would improve overall chances of AGI going well by double-digit percentage points at least.
Makes sense. By comparison, my own unconditional estimate of p(doom) is not much higher than 10%, and so it’s hard on my view for any intervention to have a double-digit percentage point effect.
The crude mortality rate before the pandemic was about 0.7%. If we use that number to estimate the direct cost of a 1-year pause, then this is the bar that we’d need to clear for a pause to be justified. I find it plausible that this bar could be met, but at the same time, I am also pretty skeptical of the mechanisms various people have given for how a pause will help with AI safety.

Matthew Barnett 26 Apr 2024 20:34 UTC
5 points
3
in reply to: quetzal_rainbow’s comment on: AI Regulation is Unsafe
I don’t think staging a civil war is generally a good way of saving lives. Moreover, ordinary aging has about a 100% chance of “killing literally everyone” prematurely, so it’s unclear to me what moral distinction you’re trying to make in your comment. It’s possible you think that:
1. Death from aging is not as bad as death from AI because aging is natural whereas AI is artificial
2. Death from aging is not as bad as death from AI because human civilization would continue if everyone dies from aging, whereas it would not continue if AI kills everyone
In the case of (1) I’m not sure I share the intuition. Being forced to die from old age seems, if anything, worse than being forced to die from AI, since it is long and drawn-out, and presumably more painful than death from AI. You might also think about this dilemma in terms of act vs. omission, but I am not convinced there’s a clear asymmetry here.
In the case of (2), whether AI takeover is worse depends on how bad you think an “AI civilization” would be in the absence of humans. I recently wrote a post about some reasons to think that it wouldn’t be much worse than a human civilization.
In any case, I think this is simply a comparison between “everyone literally dies” vs. “everyone might literally die but in a different way”. So I don’t think it’s clear that pushing for one over the other makes someone a “Dark Lord”, in the morally relevant sense, compared to the alternative.

Matthew Barnett 26 Apr 2024 17:53 UTC
2 points
0
in reply to: Daniel Kokotajlo’s comment on: AI Regulation is Unsafe

So, it sounds like you’d be in favor of a 1-year pause or slowdown then, but not a 10-year?

That depends on the benefits that we get from a 1-year pause. I’d be open to the policy, but I’m not currently convinced that the benefits would be large enough to justify the costs.

Also, I object to your side-swipe at longtermism

I didn’t side-swipe at longtermism, or try to dunk on it. I think longtermism is a decent philosophy, and I consider myself a longtermist in the dictionary sense as you quoted. I was simply talking about people who aren’t “fully committed” to the (strong) version of the philosophy.