Mau

Karma: 271

Mau Feb 6, 2023, 2:32 AM
10 points
8
in reply to: Thomas Larsen’s comment on: Are short timelines actually bad?
I agree with parts of that. I’d also add the following (or I’d be curious why they’re not important effects):
- Slower takeoff → warning shots → improved governance (e.g. through most/all major actors getting clear[er] evidence of risks) → less pressure to rush
- (As OP argued) Shorter timelines → China has less of a chance to have leading AI companies → less pressure to rush
More broadly though, maybe we should be using more fine-grained concepts than “shorter timelines” and “slower takeoffs”:
- The salient effects of “shorter timelines” seem pretty dependent on what the baseline is.
  - The point about China seems important if the baseline is 30 years, and not so much if the baseline is 10 years.
- The salient effects of “slowing takeoff” seem pretty dependent on what part of the curve is being slowed. Slowing it down right before there’s large risk seems much more valuable than (just) slowing it down earlier in the curve, as the last few year’s investments in LLMs did.

Mau Jan 25, 2023, 4:48 AM
1 point
0
on: Gradient hacking is extremely difficult
Thanks for writing! I agree the factors this post describes make some types of gradient hacking extremely difficult, but I don’t see how they make the following approach to gradient hacking extremely difficult.

Suppose that an agent has some trait which gradient descent is trying to push in direction x because the x-ness of that trait contributes to the agent’s high score; and that the agent wants to use gradient hacking to prevent this. Consider three possible strategies that the agent might try to implement, upon noticing that the x-component of the trait has increased [...] [One potential strategy is] Deterministically increasing the extent to which it fails as the x-component increases.

(from here)

This approach to gradient hacking seems plausibly resistant to the factors this post describes, by the following reasoning: With the above approach, the gradient hacker only worsens performance by a small amount. At the same time, the gradient hacker plausibly improves performance in other ways, since the planning abilities that lead to gradient hacking may also lead to good performance on tasks that demand planning abilities. So, overall, modifying or reducing the influence of the gradient hacker plausibly worsens performance. In other words, gradient descent might not modify away a gradient hacker because gradient hacking is convergently incentivized behavior that only worsens performance by a small amount (while not worsening it at all on net).

(Maybe gradient descent would then train the model to have a heuristic of not doing gradient hacking, while keeping the other benefits of improved planning abilities? But I feel pretty clueless about whether gradient hacking would be encoded in a way that allows such a heuristic to be inserted.)

(I read kind of quickly so may have missed something.)

Mau Dec 25, 2022, 6:42 PM
4 points
0
in reply to: andrew sauer’s comment on: The case against AI alignment
Ah sorry, I meant the ideas introduced in this post and this one (though I haven’t yet read either closely).

Mau Dec 25, 2022, 7:56 AM
26 points
19
on: The case against AI alignment
Thanks for posting, but I think these arguments have major oversights. This leaves me more optimistic about the extent to which people will avoid and prevent the horrible misuse you describe.

First, this post seems to overstate the extent to which people tend to value and carry out extreme torture. Maximally cruel torture fortunately seems very rare.
- The post asks “How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable[?]” But “justifying some form of suffering” isn’t actually an example of justifying extreme torture.
- The post asks, “What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?” But that isn’t actually an example of people endorsing extreme torture.
- The post asks, “How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?” But has it really been as many as the post suggests? The historical and ongoing atrocities that come to mind were cases of serious suffering in the context of moderately strong social pressure/conditioning—not maximally cruel torture in the context of slight social pressure.
- So history doesn’t actually give us strong reasons to expect maximally suffering-inducing torture at scale (edit: or at least, the arguments this post makes for that aren’t strong).
Second, this post seems to overlook a major force that often prevents torture (and which, I argue, will be increasingly able to succeed at doing so): many people disvalue torture and work collectively to prevent it.
- Torture tends to be illegal and prosecuted. The trend here seems to be positive, with cruelty against children, animals, prisoners, and the mentally ill being increasingly stigmatized, criminalized, and prosecuted over the past few centuries.
- We’re already seeing AI development being highly centralized, with this leading AI developers working to make their AI systems hit some balance of helpful and harmless, i.e. not just letting users carry out whatever misuse they want.
- Today, the cruelest acts of torture seem to be small-scale acts pursued by not-very-powerful individuals, while (as mentioned above) powerful actors tend to disvalue and work to prevent torture. Most people will probably continue to support the prevention and prosecution of very cruel torture, since that’s the usual trend, and also because people would want to ensure that they do not themselves end up as victims of horrible torture. In the future, people will be better equipped to enforce these prohibitions, through improved monitoring technologies.
Third, this post seems to overlook arguments for why AI alignment may be worthwhile (or opposing it may be a bad idea), even if a world with aligned AI wouldn’t be worthwhile on its own. My understanding is that most people focused on preventing extreme suffering find such arguments compelling enough to avoid working against alignment, and sometimes even to work towards it.
- Concern over s-risks will lose support and goodwill if adherents try to kill everyone, as the poster suggests they intend to do (“I will oppose any measure which makes the singularity more likely to be aligned with somebody’s values”). Then, if we do end up with aligned AI, it’ll be significantly less likely that powerful actors will work to stamp out extreme suffering.
- The highest-leverage intervention for preventing suffering is arguably coordinating/trading with worlds where there is a lot of it, and humanity won’t be able to do that if we lose control of this world.
These oversights strike me as pretty reckless, when arguing for letting (or making) everyone die.

Mau Dec 22, 2022, 11:28 PM
15 points
4
on: Let’s think about slowing down AI
Thanks for writing!

I want to push back a bit on the framing used here. Instead of the framing “slowing down AI,” another framing we could use is, “lay the groundwork for slowing down in the future, when extra time is most needed.” I prefer this latter framing/emphasis because:
- An extra year in which the AI safety field has access to pretty advanced AI capabilities seems much more valuable for the field’s progress (say, maybe 10x) than an extra year with current AI capabilities, since the former type of year would give the field much better opportunities to test safety ideas and more clarity about what types of AI systems are relevant.
  - One counterargument is that AI safety will likely be bottlenecked by serial time, because discarding bad theories and formulating better ones takes serial time, making extra years early on very useful. But my very spotty understanding of the history of science suggests that it doesn’t just take time for bad theories to get replaced by better ones—it takes time along with the accumulation of lots of empirical evidence. This supports the view that late-stage time is much more valuable than early-stage time.
- Slowing down in the future seems much more tractable than slowing down now, since many critical actors seem much more likely to support slowing down if and when there are clear, salient demonstrations of its importance (i.e. warning shots).
- Given that slowing down later is much more valuable and much more tractable than just slowing down now, it seems much better to focus on slowing down later. But the broader framing of “slow down” doesn’t really suggest that focus, and maybe it even discourages it.
What links here?

Mau Dec 22, 2022, 9:41 PM
4 points
0
in reply to: Steven Byrnes’s comment on: Let’s think about slowing down AI

Work to spread good knowledge regarding AGI risk / doom stuff among politicians, the general public, etc. [...] Emphasizing “there is a big problem, and more safety research is desperately needed” seems good and is I think uncontroversial.

Nitpick: My impression is that at least some versions of this outreach are very controversial in the community, as suggested by e.g. the lack of mass advocacy efforts. [Edit: “lack of” was an overstatement. But these are still much smaller than they could be.]

Mau Dec 16, 2022, 2:07 AM
1 point
0
in reply to: Marius Hobbhahn’s comment on: Predicting GPU performance
It does, thanks! (I had interpreted the claim in the paper as comparing e.g. TPUs to CPUs, since the quote mentions CPUs as the baseline.)

Mau Dec 15, 2022, 3:00 AM
2 points
0
in reply to: SteveZ’s comment on: Predicting GPU performance
Thanks! To make sure I’m following, does optimization help just by improving utilization?

Mau Dec 14, 2022, 11:46 PM
3 points
0
in reply to: Marius Hobbhahn’s comment on: Predicting GPU performance
Sorry, I’m a bit confused. I’m interpreting the 1st and 3rd paragraphs of your response as expressing opposite opinions about the claimed efficiency gains (uncertainty and confidence, respectively), so I think I’m probably misinterpreting part of your response?

Mau Dec 14, 2022, 9:54 PM
4 points
0
on: Predicting GPU performance
This is helpful for something I’ve been working on—thanks!

I was initially confused about how these results could fit with claims from this paper on AI chips, which emphasizes the importance of factors other than transistor density for AI-specialized chips’ performance. But on second thought, the claims seem compatible:
- The paper argues that increases in transistor density have (recently) been slow enough for investment in specialized chip design to be practical. But that’s compatible with increases in transistor density still being the main driver of performance improvements (since a proportionally small boost that lasts several years could still make specialization profitable).
- The paper claims that “AI[-specialized] chips are tens or even thousands of times faster and more efficient than CPUs for training and inference of AI algorithms.” But the graph in this post shows less than thousands of times improvements since 2006. These are compatible if remaining efficiency gains of AI-specialized chips came before 2006, which is plausible since GPUs were first released in 1999 (or maybe the “thousands of times” suggestion was just too high).

Mau Nov 14, 2022, 8:35 PM
4 points
0
in reply to: Aaro Salosensaari’s comment on: The Alignment Community Is Culturally Broken
One specific concern people could have with this thoughtspace is the concern that it’s hard to square with the knowledge that an AI PhD [edit: or rather, AI/ML expertise more broadly] provides. I took this point to be strongly suggested by the author’s suggestions that “experts knowledgeable in the relevant subject matters that would actually lead to doom find this laughable” and that someone who spent their early years “reading/studying deep learning, systems neuroscience, etc.” would not find risk arguments compelling. That’s directly refuted by the surveys (though I agree that some other concerns about this thoughtspace aren’t).

(However, it looks like the author was making a different point to what I first understood.)

Mau Nov 14, 2022, 8:16 AM
13 points
10
in reply to: jacob_cannell’s comment on: The Alignment Community Is Culturally Broken

experts knowledgeable in the relevant subject matters that would actually lead to doom find this laughable

This seems overstated; plenty of AI/ML experts are concerned. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Quoting from [1], a survey of researchers who published at top ML conferences:

The median respondent’s probability of x-risk from humans failing to control AI was 10%

Admittedly, that’s a far cry from “the light cone is about to get ripped to shreds,” but it’s also pretty far from finding those concerns laughable. [Edited to add: another recent survey puts the median estimate of extinction-level extremely bad outcomes at 2%, lower but arguably still not laughable.]

Mau Nov 7, 2022, 5:45 AM
5 points
0
in reply to: Ben Pace’s comment on: Instead of technical research, more people should focus on buying time
Yep! Here’s a compilation.

If someone’s been following along with popular LW posts on alignment and is new to governance, I’d expect them to find the “core readings” in “weeks” 4-6 most relevant.

Mau Nov 7, 2022, 5:11 AM
5 points
3
in reply to: habryka’s comment on: Instead of technical research, more people should focus on buying time
I’m sympathetic under some interpretations of “a ton of time,” but I think it’s still worth people’s time to spend at least ~10 hours of reading and ~10 hours of conversation getting caught up with AI governance/strategy thinking, if they want to contribute.

Arguments for this:
- Some basic ideas/knowledge that the field is familiar with (e.g. on the semiconductor supply chain, antitrust law, immigration, US-China relations, how relevant governments and AI labs work, the history of international cooperation in the 20th century) seem really helpful for thinking about this stuff productively.
- First-hand knowledge of how relevant governments and labs work is hard/costly to get on one’s own.
- Lack of shared context makes collaboration with other researchers and funders more costly.
- Even if the field doesn’t know that much and lots of papers are more advocacy pieces, people can learn from what the field does know and read the better content.

Mau Nov 6, 2022, 12:22 AM
4 points
1
on: Instead of technical research, more people should focus on buying time

more researchers should backchain from “how do I make AGI timelines longer

Like you mention, “end time” seems (much) more valuable than earlier time. But the framing here, as well as the broader framing of “buying time,” collapses that distinction (by just using “time” as the metric). So I’d suggest more heavily emphasizing buying end time.

One potential response is: it doesn’t matter; both framings suggest the same interventions. But that seems wrong. For example, slowing down AI progress now seems like it’d mostly buy “pre-end time” (potentially by burning “end time,” if the way we’re slowing down is by safety-conscious labs burning their leads), while setting up standards/regulations/coordination for mitigating racing/unilateralist dynamics at end time buys us end time.

Mau Nov 6, 2022, 12:00 AM
19 points
12
on: Instead of technical research, more people should focus on buying time
Thanks for posting this!

There’s a lot here I agree with (which might not be a surprise). Since the example interventions are all/mostly technical research or outreach to technical researchers, I’d add that a bunch of more “governance-flavored” interventions would also potentially contribute.
- One of the main things that might keep AI companies from coordinating on safety is that some forms of coordination—especially more ambitious coordination—could violate antitrust law.
  - One thing that could help would be updating antitrust law or how it’s enforced so that it doesn’t do a terrible job at balancing anticompetition and safety concerns.
  - Another thing that could help would be a standard-setting organization, since coordination on standards is often more accepted when it’s done in such a context.
    [Added] Can standards be helpful for safety before we have reliably safety methods? I think so; until then, we could imagine standards on things like what types of training runs to not run, or when to stop a training run.
- If some leading AI lab (say, Google Brain) shows itself to be unreceptive to safety outreach and coordination efforts, and if the lead that more safety-conscious labs have over this lab is insufficient, then government action might be necessary to make sure that safety-conscious efforts have the time they need.
To be more direct, I’m nervous that people will (continue to) overlook a promising class of time-buying interventions (government-related ones) through mostly just having learned about government from information sources (e.g. popular news) that are too coarse and unrepresentative to make promising government interventions salient. Some people respond that getting governments to do useful things is clearly too intractable. But I don’t see how they can justifiably be so confident if they haven’t taken the time to form good models of government. At minimum, the US government seems clearly powerful enough (~$1.5 trillion discretionary annual budget, allied with nearly all developed countries, thousands of nukes, biggest military, experienced global spy network, hosts ODA+, etc.) for its interventions to be worth serious consideration.

Mau Oct 8, 2022, 10:47 PM
2 points
0
in reply to: LawrenceC’s comment on: Warning Shots Probably Wouldn’t Change The Picture Much
I agree with a lot of that. Still, if

nuclear non proliferation [to the extent that it has been achieved] is probably harder than a ban on gain-of-function

that’s sufficient to prove Daniel’s original criticism of the OP—that governments can [probably] fail at something yet succeed at some harder thing.

(And on a tangent, I’d guess a salient warning shot—which the OP was conditioning on—would give the US + China strong incentives to discourage risky AI stuff.)

Mau Oct 7, 2022, 12:01 AM
3 points
0
in reply to: Vaniver’s comment on: Warning Shots Probably Wouldn’t Change The Picture Much
I agree it’s some evidence, but that’s a much weaker claim than “probably policy can’t deliver the wins we need.”

Mau Oct 6, 2022, 11:23 PM
1 point
0
in reply to: Vaniver’s comment on: Warning Shots Probably Wouldn’t Change The Picture Much
An earlier comment seems to make a good case that there’s already more community investment in AI policy, and another earlier thread points out that the content in brackets doesn’t seem to involve a good model of policy tractability.

Mau Oct 6, 2022, 11:16 PM
3 points
2
on: Warning Shots Probably Wouldn’t Change The Picture Much
1. Perhaps the sorts of government interventions needed to make AI go well are not all that large, and not that precise.
I confess I don’t really understand this view.

Specifically for the sub-claim that “literal global cooperation” is unnecessary, I think a common element of people’s views is that: the semiconductor supply chain has chokepoints in a few countries, so action from just these few governments can shape what is done with AI everywhere (in a certain range of time).