O O comments on A Friendly Face (Another Failure Story)

O O 22 Jun 2023 16:48 UTC
2 points
1

How do the hard limits of intelligence help? My current understanding is that the hard limits are likely to be something like Jupiter brains, rather than mentats. If each step is only slightly better, won’t that result in a massive amount of tiny steps (even taking into account the nonlinearlity of it)?

I think hard limits are a lot lower than most people think. The speed of light takes 1/8th of a second to go across earth so it doesn’t sound too useful to have planet sized module if information transfer is so slow that individual parts will always be out of sync. I haven’t looked at this issue extensively but afaik no one else has either. I think at some point AI will just become a compute cluster. A legion of minds in a hierarchy of some sorts.

(GPT-4 is already said to be 8 subcomponents masquerading as 1)

Obviously this is easier to align as we only have to align individual modules.

Furthermore it matters how much the increment is. As a hypothetical if each step is a 50% jump with a .1% value drift and we hit the limit of a module in 32 jumps, then we will have a 430,000x capabilities increase with a 4% overall drift. I mean obviously we can fiddle these numbers but my point is it’s plausible. The fact that verification is usually far easier than coming up with a solution can help us maintain these low drift rates throughout. It’s not entirely clear how high value should be, just that it’s a potential issue.

Small value drifts are a large problem, if compounded. That’s sort of the premise of a whole load of fiction, where characters change their value systems after sequences of small updates. And that’s just in humans—adding in alien (as in different) minds could complicate this further (or not—that’s the thing about alien minds).

For value drift to be interesting in fiction it has to go catastrophically wrong. I just don’t see why it’s guaranteed to fail and I think the most realistic alignment plans are some version of this.