Daniel Kokotajlo comments on ML is now automating parts of chip R&D. How big a deal is this?

Daniel Kokotajlo 12 Jun 2021 16:26 UTC
9 points
Awesome, thanks! And welcome to LW! I found this very helpful and now have some follow-up questions if you don’t mind. :)
1. How does this square with Zac’s answer below? It on the surface seems to contradict what you say; after all, it proposes 10x-1000x improvements to AI stuff whereas you say it won’t even be 1%! I think I can see a way that your two answers can be interpreted as consistent, however: You identify the main benefit of this tech as reducing the clock time it takes for engineers to come up with a new good chip design. So even if the new design is only 1% better than the design the engineers would have come up with, if it happens a lot faster, that’s a big deal. Why is it a big deal? Well, as Zac said, it means the latest AI architectures can be quickly supplemented by custom chips, and in general custom chips provide 10x − 1000x speedups. Would you agree with this synthesis?
2. I’d be interested in your best guess for what the median X’s and Y’s in this sentence are: “In about X years, we’ll be in a regime where the latest AI models are run on specialized hardware that provides a factor-of-Y speedup over today’s hardware.”
3. ETA: Maybe another big implication of this technology is that it’ll lower the barrier to entry for new chipmakers? Like, maybe 5 years from now there’ll be off-the-shelf AI techniques that let people design cutting-edge new chips, and so China and Russia and India and everyone will have their own budding chip industry supported by generous government subsidies. Or maybe not—maybe most of the barriers to entry have to do with manufacturing talent rather than design talent?
- ljh2 24 Aug 2021 3:21 UTC
  5 points
  Parent
  I thought I wrote an answer to this. Turns out I didn’t. Also, I am a horrific procrastinator.
  1. In some sense, I’d agree with this synthesis.
    I say some sense, because the other bottleneck that lots of chip designs have is verification. Somebody has to test the new crazy shit a designer might create, right? To go back to our city planner analogy—sure, perhaps you create the most optimal connections between buildings. But what if the designer but the doors on the roof, because it’s the fastest way down?
    Yes, designs can be come up with faster, and can theoretically be fabbed out faster. But, as with anything that depends on humans, that itself 1) has a certain amount of complexity that builds technical debt and 2) requires inspection.
    To me, this is like how software engineering has A) the actual development and B) the deployment to production. No matter how fast B) is, which may certainly aid in iteration, A) is still heavily gated by humans.
  2. It’s hard to give a concrete answer for that, since there are A) so many different AI models and B) so many different hardware architectures to run those AI models. AI is a full-stack problem, that honestly still has lots of room to grow, so any advance in any component of the stack will produce growth.
    Put a gun to my head though—x = 3, y = 2
  3. Though not in this specific paper/iteration, this technology definitely has potential to lower time-to-fab—more specifically, post-silicon fabrication.
    But, you see, I don’t think the barrier to entry is post-silicon fabrication. It is creating the design in the first place, and verifying it. This is what ARM does—they already provide pre-verified designs (reference implementations) for you to rip off of and, as is, ship out. Just give them licensing fees!
    Furthermore, in many ways, a 1-2 year lead time is kinda built in already in our society (think of it—you usually buy new hardware every couple years, right?). Thus, suppose you completely eliminate post-silicon fabrication times. Where would this extra time go? I highly doubt we would change our society-accepted cadence of hardware rotations. Most definitely, it would go right back into creating new designs—human brains. Thus, I think the biggest barrier to entry is knowledge and engineering talent.
    Manufacturing talent is, frankly, thanks to TSMC’s duopoly in foundries, not much of a barrier. Sure, it’s a barrier that China is tackling (see the whole SMIC fiasco) but not one much of the Western world is willing to tackle.
    So, again, that just circles back to design talent.
  All in all, I rebuff my original point that this isn’t that big of a deal, but is still insanely cool. I’d love to heavily advance this technology, because it’s pretty god damn annoying, but it just means I’d have more time to sit on my hands, and that’s no guarantee I’d do anything good with that time!
  - Daniel Kokotajlo 24 Aug 2021 9:41 UTC
    2 points
    Parent
    Thanks! As before, this was helpful & I have some follow-up questions. :) Feel free to not reply if you don’t want to.
    1. Can verification be automated too, in the next 10 years?
    2. Quantitatively, about how much time + money does a good version of this automated chip design save? E.g. “It normally takes 1 year to design a chip and 2 years to actually scale up production; this tech turns that 1 year into 1 month (when you include verification), for an overall time savings of 33%. As for cost, design is a small fraction of the cost (even a research team of hundreds for a year is nothing compared to the cost of a manufacturing line or whatever) so the effect is negligible.”
    3. y = 2? That’s way lower y than I expected, especially considering that you “rebuff my original point that this isn’t that big of a deal.” A 2x improvement in 3 years is NOT a big deal, right? Isn’t that slightly slower than the historical rate of progress from e.g. moore’s law etc.? Or are you saying it’s going to be a 2x improvement on top of the regular progress from other sources? Oh… maybe you are specifically talking about speed improvements rather than all-things-considered cost to train a model of a given size on a given dataset? It’s the latter that I’m interested in, I probably misspoke.
    4. What is post-silicon fabrication? When I google it it redirects to “post-silicon validation.” If creating the design and verifying it is the barrier to entry, then won’t this AI tech help reduce the barrier to entry since it automates the design part? I guess I just don’t understand your point 3.
    5. “Thus, suppose you completely eliminate post-silicon fabrication times. Where would this extra time go? I highly doubt we would change our society-accepted cadence of hardware rotations. Most definitely, it would go right back into creating new designs—human brains. ” I’m particularly keen to hear what you mean by this.
    - ljh2 24 Aug 2021 16:50 UTC
      5 points
      Parent
      Definitely not in the next 10 years. In some sense, that’s what formal verification is all about. There’s progress, but from my perspective, it’s a very linear growth.
      The tools that I have seen (e.g. out of the RISC-V Summit, or DVCon) are difficult to adopt, and there’s a large inertia you have to overcome since many big Semi companies already have their own custom flows built up over decades.
      I think it’ll take a young plucky startup to adopt and push for the usage of these tools—but even then, you need the talent to learn these tools, and frankly hardware is filled with old people.
      I think we have different interpretations of “design”. You consider chip design in the aggregate, but I’m subdividing it into multiple areas. There’s several aspects of chip design, some of which can be automated, but I’m claiming never to an extreme extent as e.g. 1 month. This technology in particular really only helps in determining where to place “buildings” but not really much in actually building the “buildings” themselves. While valuable, there’s only so much “placing” can do.
      My view is that, the time and money spent won’t go down, just reallocated, which may or may not increase quality.
      Sorry, I guess I meant the former where I incorporate every source, at least on the hardware side. Were you to isolate just the ML Chip placement gain… again, hard to say. It’s just indicative of a release of resources, but who knows if those extra resources can/will be properly directed to something better?
      + 5. : Sorry! I guess I meant post-design fabrication, which is really just a term I came up with to mean “shipping it to TSMC once you’re done designing”. A better term, in hindsight, is just called “tapeout”, but I was hesitant to use the term time-to-tapeout since that feels cumulative rather than isolating that one period of time I mean.
      
      See: https://anysilicon.com/verification-validation-testing-asic-soc-designs-differences/
      
      What I mean is that, this technology is addressing the “Physical Design” blob of time as above. Notice that the whole critical path to “Shipping”/getting the chips out there goes “Verification”--> “Tapeout” --> “Validation”/Testing
      
      Suppose the “Physical Design” time gets eliminated. These freed resources will most definitely go into “RTL Design” and not “Verification”. That’s what I mean by “creating new designs”—it gives us more time to think of cool stuff, but again, depends if that stuff is good or not.
      
      Why will extra resources not be devoted to verification? That’s a whole can of worms. Industry inertia, overlapping talent skillset, business models, design complexity—but I guess most of all I’d say inertia.
      
      On inertia—as I said, this cadence takes about 1-2 years. We are so so so very accustomed to this cadence, I can’t see it changing barring massive changes in our needs. If you told me you could reduce our verification time from 1 year to 11 months, I’d just spend that extra month iterating on my RTL design instead, or use that extra time to run more simulations, because 11 vs. 12 months doesn’t mean much.
      
      If you told me I could reduce it from 1 year --> 6 months? I’d maaaaybe throw money at you. It has potential to double my income, but that depends.
      
      Imagine new iPhones came out every 6 months instead of yearly. Isn’t that super weird? Well… That depends on how well Apple can market to me that I absolutely need it.
      
      Perhaps that differs for AI use cases… but even there, I’d argue this yearly cadence is ingrained already