Eliezer thinks that if you have any optimization powerful enough to reproduce humanlike cognition inside a detailed boundary by looking at a human-labeled dataset trying to outline the boundary, the thing doing the optimization is powerful enough that we cannot assume its neutrality the way we can assume the neutrality of gradient descent.
To clarify: it’s not that you think that gradient descent can’t in fact find human-level cognition by trial and error, it’s that you think “the neutrality of gradient descent” is an artifact of its weakness? Or maybe that gradient descent is neutral, but that if it finds a sophisticated policy that policy isn’t neutral?
I don’t really know that “outline the boundary” means here. We specify a performance criterion, then we do a search for a model that scores well according to that criterion. It’s not like we are trying to find some illustrative examples that point out the concept we want to learn, we are just implementing a test for the behavior we are interested in.
The imaginary Paul in my head replies that we actually are using an AGI to train on X and get X
In the very long run I expect AGI to supply the optimization power rather than trial and error, and the continued alignment comes from some combination of “Our training process as long as the optimization is benign” + “Our AGI is benign.” But I totally agree that you need the AI trained by gradient descent to work, I’m definitely not imagining that everything will be OK because the optimization is done by AGI instead of by gradient descent. In practice I’m basically always talking about the case where gradient descent is doing the optimization.
To clarify: it’s not that you think that gradient descent can’t in fact find human-level cognition by trial and error, it’s that you think “the neutrality of gradient descent” is an artifact of its weakness? Or maybe that gradient descent is neutral, but that if it finds a sophisticated policy that policy isn’t neutral?
I don’t really know that “outline the boundary” means here. We specify a performance criterion, then we do a search for a model that scores well according to that criterion. It’s not like we are trying to find some illustrative examples that point out the concept we want to learn, we are just implementing a test for the behavior we are interested in.
In the very long run I expect AGI to supply the optimization power rather than trial and error, and the continued alignment comes from some combination of “Our training process as long as the optimization is benign” + “Our AGI is benign.” But I totally agree that you need the AI trained by gradient descent to work, I’m definitely not imagining that everything will be OK because the optimization is done by AGI instead of by gradient descent. In practice I’m basically always talking about the case where gradient descent is doing the optimization.