Akash comments on Akash’s Shortform

Akash 20 Nov 2024 16:58 UTC
37 points
0
Suppose the US government pursued a “Manhattan Project for AGI”. At its onset, it’s primarily fuelled by a desire to beat China to AGI. However, there’s some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)
Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?
- Akash 20 Nov 2024 17:13 UTC
  33 points
  −4
  Parent
  My own impression is that this would be an improvement over the status quo. Main reasons:
  - A lot of my P(doom) comes from race dynamics.
  - Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can’t do much to end the race. Their main strategy would be to go to the USG.
  - If the USG runs the Manhattan Project (or there’s some sort of soft nationalization in which the government ends up having a much stronger role), it’s much easier for the USG to see that misalignment risks are concerning & to do something about it.
  - A national project would be more able to slow down and pursue various kinds of international agreements (the national project has more access to POTUS, DoD, NSC, Congress, etc.)
  - I expect the USG to be stricter on various security standards. It seems more likely to me that the USG would EG demand a lot of security requirements to prevent model weights or algorithmic insights from leaking to China. One of my major concerns is that people will want to pause at GPT-X but they won’t feel able to because China stole access to GPT-Xminus1 (or maybe even a slightly weaker version of GPT-X).
  - In general, I feel like USG natsec folks are less “move fast and break things” than folks in SF. While I do think some of the AGI companies have tried to be less “move fast and break things” than the average company, I think corporate race dynamics & the general cultural forces have been the dominant factors and undermined a lot of attempts at meaningful corporate governance.
  (Caveat that even though I see this as a likely improvement over status quo, this doesn’t mean I think this is the best thing to be advocating for.)
  (Second caveat that I haven’t thought about this particular question very much and I could definitely be wrong & see a lot of reasonable counterarguments.)
  - davekasten 20 Nov 2024 22:13 UTC
    14 points
    4
    Parent
    As you know, I have huge respect for USG natsec folks. But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the “fuck you, I’m doing Iran-Contra” folks. Which do you expect will get in control of such a program ? It’s not immediately clear to me which ones would.
    - Akash 22 Nov 2024 15:51 UTC
      2 points
      0
      Parent
      @davekasten I know you posed this question to us, but I’ll throw it back on you :) what’s your best-guess answer?
      Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?
  - O O 20 Nov 2024 18:38 UTC
    3 points
    −14
    Parent
    Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.
    - Vladimir_Nesov 20 Nov 2024 18:51 UTC
      14 points
      6
      Parent
      A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. “Alignment is impossible” is not necessary for great concern, isn’t implied by great concern.
      - Richard_Ngo 20 Nov 2024 23:41 UTC
        15 points
        −7
        Parent
        I don’t think this line of argument is a good one. If there’s a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
        Vladimir_Nesov 20 Nov 2024 23:55 UTC
        17 points
        5
        Parent
        Still consistent with great concern. I’m pointing out that O O’s point isn’t locally valid, observing concern shouldn’t translate into observing belief that alignment is impossible.
    - MondSemmel 20 Nov 2024 18:58 UTC
      12 points
      8
      Parent
      Yudkowsky has a pinned tweet that states the problem quite well: it’s not so much that alignment is necessarily infinitely difficult, but that it certainly doesn’t seem anywhere as easy as advancing capabilities, and that’s a problem when what matters is whether the first powerful AI is aligned:
      Safely aligning a powerful AI will be said to be ‘difficult’ if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.
      - Akash 20 Nov 2024 19:58 UTC
        4 points
        2
        Parent
        Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
        If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
        This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps “somewhat hard but not intractably hard”).
    - Seth Herd 20 Nov 2024 21:57 UTC
      7 points
      5
      Parent
      It’s not every post, but there are still a lot of people who think that alignment is very hard.
      
      The more common assumption is that we should assume that alignment isn’t trivial, because an intellectually honest assessment of the range of opinions suggests that we collectively do not yet know how hard alignment will be.
- habryka 20 Nov 2024 17:06 UTC
  20 points
  7
  Parent
  If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.
  My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerous development in secret.
  In the case of nuclear weapons the U.S. really went very far under the advisement of Edward Teller, so I think the outside view here really doesn’t look good:
  - Akash 20 Nov 2024 17:18 UTC
    4 points
    2
    Parent
    Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the “subsidy model”, but they were willing to ask for certain concessions from industry.
    Are there any concessions/arrangements that you would advocate for? Are there any ways to do the “subsidy model” well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?
    - habryka 20 Nov 2024 18:57 UTC
      8 points
      6
      Parent
      I think “full visibility” seems like the obvious thing to ask for, and something that could maybe improve things. Also, preventing you from selling your products to the public, and basically forcing you to sell your most powerful models only to the government, gives the government more ability to stop things when it comes to it.
      I will think more about this, I don’t have any immediate great ideas.
      - Akash 20 Nov 2024 20:07 UTC
        4 points
        0
        Parent
        If you could only have “partial visibility”, what are some of the things you would most want the government to be able to know?
        Nathan Helm-Burger 20 Nov 2024 22:19 UTC
        2 points
        −2
        Parent
        I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access.
        
        Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn’t give away corporate algorithmic secrets.
        
        A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can’t have a good sense of a what the model would be capable of in the hands of bad actors if you don’t test fine-tuning it on hazardous info.
- Richard_Ngo 20 Nov 2024 23:39 UTC
  9 points
  2
  Parent
  Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.
  In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.
  - Akash 21 Nov 2024 16:27 UTC
    4 points
    2
    Parent
    Can you say more about scenarios where you envision a later project happening that has different motivations?
    I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn’t seem clear to me that it’s good to wait for a new zeitgeist. Reasons:
    A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change.
    The longer we wait, the more capable the “most capable model that wasn’t secured” is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn’t have happened if security had kicked off sooner.)
- Seth Herd 20 Nov 2024 22:07 UTC
  4 points
  3
  Parent
  One factor is different incentives for decision-makers. The incentives (and the mindset) for tech companies is to move fast and break things. The incentives (and mindset) for government workers is usually vastly more conservative.
  
  So if it is the government making decisions about when to test and deploy new systems, I think we’re probably far better off WRT caution.
  
  That must be weighed against the government typically being very bad at technical matters. So even an attempt to be cautious could be thwarted by lack of technical understanding of risks.
  
  Of course, the Trump administration is attempting to instill a vastly different mindset, more like tech companies. So if it’s that administration we’re talking about, we’re probably worse off on net with a combination of lack of knowledge and YOLO attitudes. Which is unfortunate—because this is likely to happen anyway.
  
  As Habryka and others have noted, it also depends on whether it reduces race dynamics by aggregating efforts across companies, or mostly just throws funding fuel on the race fire.
- davekasten 20 Nov 2024 19:57 UTC
  4 points
  3
  Parent
  I think this is a (c) leaning (b), especially given that we’re doing it in public. Remember, the Manhattan Project was a highly-classified effort and we know it by an innocuous name given to it to avoid attention.
  
  Saying publicly, “yo, China, we view this as an all-costs priority, hbu” is a great way to trigger a race with China...
  
  But if it turned out that we knew from ironclad intel with perfect sourcing that China was already racing (I don’t expect this to be the case), then I would lean back more towards (c).
- Akash 20 Nov 2024 17:00 UTC
  4 points
  0
  Parent
  @davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)
  - Daniel Kokotajlo 20 Nov 2024 17:25 UTC
    6 points
    1
    Parent
    (c). Like if this actually results in them behaving responsibly later, then it was all worth it.
    - Akash 20 Nov 2024 17:34 UTC
      5 points
      0
      Parent
      What do you think are the most important factors for determining if it results in them behaving responsibly later?
      For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG “behaving more responsibly later?”
- Joe Collman 21 Nov 2024 20:59 UTC
  3 points
  1
  Parent
  Some thoughts:
  - The correct answer is clearly (c) - it depends on a bunch of factors.
  - My current guess is that it would make things worse (given likely values for the bunch of other factors) - basically for Richard’s reasons.
    Given [new potential-to-shift-motivation information/understanding], I expect there’s a much higher chance that this substantially changes the direction of a not-yet-formed project, than a project already in motion.
    Specifically:
    Who gets picked to run such a project? If it’s primarily a [let’s beat China!] project, are the key people cautious and highly adaptable when it comes to top-level goals? Do they appoint deputies who’re cautious and highly adaptable?
    Here I note that the kind of ‘caution’ we’d need is [people who push effectively for the system to operate with caution]. Most people who want caution are more cautious.
    How is the project structured? Will the structure be optimized for adaptability? For red-teaming of top-level goals?
    Suppose that a mid-to-high-level participant receives information making the current top-level goals questionable—is the setup likely to reward them for pushing for changes? (noting that these are the kind of changes that were not expected to be needed when the project launched)
    Which external advisors do leaders of the project develop relationships with? What would trigger these to change?
    ...
  - I do think that it makes sense to aim for some centralized project—but only if it’s the right kind.
    I expect that almost all the directional influence is in [influence the initial conditions].
    For this reason, I expect [push for some kind of centralized project, and hope it changes later] is a bad idea.
    I think [devote great effort to influencing the likely initial direction of any such future project] seems a great idea (so long as you’re sufficiently enlightened about desirable initial directions, of course :))
    I’d note that [initial conditions] needn’t only be internal to the project—in principle we could have reason to believe that various external mechanisms would be likely to shift the project’s motivation sufficiently over time. (I don’t know of any such reasons)
  - I think the question becomes significantly harder once the primary motivation behind a project isn’t [let’s beat China!], but also isn’t [your ideal project motivation (with your ideal initial conditions)].
  - I note that my p(doom) doesn’t change much if we eliminate racing but don’t slow down until it’s clear to most decision makers that it’s necessary.
    Likewise, I don’t expect that [focus on avoiding the earliest disasters] is likely to be the best strategy. So e.g. getting into a good position on security seems great, all else equal—but I wouldn’t sacrifice much in terms of [odds of getting to a sufficiently cautious overall strategy] to achieve better short-term security outcomes.
- Sohaib Imran 21 Nov 2024 9:06 UTC
  3 points
  0
  Parent
  One thing I’d be bearish on is visibility into the latest methods being used for frontier AI methods, which would downstream reduce the relevance of alignment research except for the research within the manhattan-like project itself. This is already somewhat true of the big labs eg. methods used for o1 like models. However, there is still some visibility in the form of system cards and reports which hint at the methods. When the primary intention is racing ahead of China, I doubt there will be reports discussing methods used for frontier systems.
- tlevin 20 Nov 2024 20:22 UTC
  3 points
  2
  Parent
  Depends on the direction/magnitude of the shift!
  I’m currently feeling very uncertain about the relative costs and benefits of centralization in general. I used to be more into the idea of a national project that centralized domestic projects and thus reduced domestic racing dynamics (and arguably better aligned incentives), but now I’m nervous about the secrecy that would likely entail, and think it’s less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project. Which is to say, even under pretty optimistic assumptions about how much such a project invests in alignment, security, and benefit-sharing, I’m pretty uncertain that this would be good, and with more realistic assumptions I probably lean towards it being bad. But it super depends on the governance, the wider context, how a “Manhattan Project” would affect domestic companies and China’s policymaking, etc.
  (I think a great start would be not naming it after the Manhattan Project, though. It seems path dependent, and that’s not a great first step.)
  - Akash 21 Nov 2024 16:28 UTC
    2 points
    0
    Parent
    it’s less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project
    Can you say more about what has contributed to this update?
- Phib 20 Nov 2024 22:20 UTC
  1 point
  0
  Parent
  Something I’m worried about now is some RFK Jr/Dr. Oz equivalent being picked to lead on AI...