We do not know how to create an AI that would not regularly hallucinate. The Values AI hallucinating would be a bad thing.
In fact, training AI to closer follow human values seems to just cause it to say what humans want to hear, while being objectively incorrect more often.
We do not know how to create an AI that reliability follows the programed values outside of a training set. Your 2nd AI going off the rails outside of the training set would be bad.
Also, human values, at least the ones we know how to consciously formulate, are pretty fragile—they are things that we want weak/soft optimization for, but would actually be very bad if a superhuman AI would hard-optimize. We do not know how to capture human values in a way that things would not go terribly wrong if the optimization is cranked to the max, and your Values AI is likely to not help enough, as we would not know what missing inputs we are failing to provide it (because they are aspects of our values that would only become important in some future circumstances we cannot even imagine today).
Finally, we wouldn’t get a second try—any bugs in your AIs, particularly the 2nd one, are very likely to be fatal. We do not know how to create your 2nd AI in such a way that the very first time we turn it on, all the bugs were already found and fixed.
We do not know how to create an AI that would not regularly hallucinate. The Values AI hallucinating would be a bad thing.
In fact, training AI to closer follow human values seems to just cause it to say what humans want to hear, while being objectively incorrect more often.
We do not know how to create an AI that reliability follows the programed values outside of a training set. Your 2nd AI going off the rails outside of the training set would be bad.
Also, human values, at least the ones we know how to consciously formulate, are pretty fragile—they are things that we want weak/soft optimization for, but would actually be very bad if a superhuman AI would hard-optimize. We do not know how to capture human values in a way that things would not go terribly wrong if the optimization is cranked to the max, and your Values AI is likely to not help enough, as we would not know what missing inputs we are failing to provide it (because they are aspects of our values that would only become important in some future circumstances we cannot even imagine today).
Finally, we wouldn’t get a second try—any bugs in your AIs, particularly the 2nd one, are very likely to be fatal. We do not know how to create your 2nd AI in such a way that the very first time we turn it on, all the bugs were already found and fixed.