Also, doesn’t all of this discussion implicitly require some gradualism in AI development? AI-Foom can be rephrased as the point at which capability outstrips our ability to monitor, and it’s hypothesized to be discontinuous, with prior capabilities being a poor indicator of final capabilities.
Also also, have we considered that we’re selecting for deception, if we’re looking for it and terminating AIs we find deceptive, while nurturing those we don’t detect? Seems like it won’t take many iterations before deception comes in before other capabilities.
Re also also: the Reverse Streetlight effect will probably come into play. It’ll optimize not just for early deception, but for any kind of deception we can’t detect.
Also also, have we considered that we’re selecting for deception, if we’re looking for it and terminating AIs we find deceptive, while nurturing those we don’t detect?
Yes. That’s a general problem (see the footnote above for a variant of it).
Another place Goodhart’s Law applies! Film at 11.
Also, doesn’t all of this discussion implicitly require some gradualism in AI development? AI-Foom can be rephrased as the point at which capability outstrips our ability to monitor, and it’s hypothesized to be discontinuous, with prior capabilities being a poor indicator of final capabilities.
Also also, have we considered that we’re selecting for deception, if we’re looking for it and terminating AIs we find deceptive, while nurturing those we don’t detect? Seems like it won’t take many iterations before deception comes in before other capabilities.
Re also also: the Reverse Streetlight effect will probably come into play. It’ll optimize not just for early deception, but for any kind of deception we can’t detect.
Yes. That’s a general problem (see the footnote above for a variant of it).