Gerald Monroe comments on My techno-optimism [By Vitalik Buterin]

Gerald Monroe 1 Dec 2023 19:51 UTC
1 point
−1
So you have made an assumption here.

AGI version N : produces m utility in the real world when faced with all the real world noise and obstacles.

Weak ASI version N+1 : produces f(s)*m utility in the real world. S is a term that represents scale times algorithmic gain.

Maximum runtime ASI version N+1: produces f(s)*m utility in the real world.

The doom concern is the thought that the benefits of giving a machine the maximum amount of compute humans are able to practically supply (note that that any given architecture saturates on interconnect bandwidth, you cannot simply rack current gen GPUS without bounds) will result in an ASI that has so much real world utility it’s unstoppable.

And the ASI can optimize itself and fit on more computers than the multi billion dollar cluster it was developed on.

If scaling is logarithmic, this would mean that F(s) = log(s). This would mean that other human actors with their weaker, but stable “tool” AGI will be able to fight back effectively in a world with some amount of escaped or hostile superintelligence. Assuming the human actors (these are mostly militaries) have a large resource advantage they would win the campaign.

I think doomers like Yudnowsky assume it’s not logarithmic, and Geohot and vitalik and others assume some kind of sharply diminishing returns.

Diminishing returns means you just revert back to the last stable version and use that, or patch your ASIs container and use it to fight against the one that just escaped. Your “last stable version” or your “containerized” ASI are weaker in utility than the one that escaped. But assuming you control most of the compute and most of the weapons, you can compensate for a utility gap. This would be an example of N+-1 of a technology saving you from the bad one.

As far as I know the current empirical data shows diminishing returns for current algorithms. This doesn’t prove another algorithm isn’t possible and obviously for specific sub problems like context length, scaling better than quadratic has a dozen papers offering more efficient methods.