Seth Herd comments on The Shortest Path Between Scylla and Charybdis

Seth Herd 18 Dec 2023 21:17 UTC
6 points
2
Great post. I think this type of strategic thinking is too rare in alignment and other academic disciplines.

I’d just modify your bottom line a little bit. The goal isn’t quite to “solve alignment as quickly as possible”. The goal is to maximize the odds that we’ve solved alignment in time to prevent human disempowerment. That’s importantly different.

It means solving alignment for the first type of AGI we build, before it’s deployed.

Having an alignment solution that applies to some type of AGI nobody is building doesn’t help. Yet a lot of otherwise brilliant alignment work goes in that direction, and IMO gets that big zero impact multiplier.
- ErickBall 24 Dec 2023 4:15 UTC
  1 point
  0
  Parent
  I think if you could demonstrably “solve alignment” for any architecture, you’d have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.
  - Seth Herd 26 Dec 2023 4:04 UTC
    2 points
    0
    Parent
    Some people. But it would depend what the prospects were for that type of AGI. Because I don’t think you could convince everyone else to stop working on other types of AGI. So it would be a race between the new “more alignable” type and the currently-leading types. If the “more alignable” type seemed guaranteed to lose that race, I’m not sure many people would even try building it.