Ege Erdil gave an important disanaology between the problem of recognizing/generating a human face, and the problem of either learning human values, or learning what plans that advance human values are like. The disanalogy is that humans are near perfect human face recognizers, but we are not near perfect valuable world-state or value-advancing-plan recognizers. This means that if we trained an AI to either recognize valuable world-states or value-advancing plans, we would actually end up just training something that recognizes what we can recognize as valuable states or plans. If we trained it like we train GANs, the discriminator would fail to be able to discriminate actually valuable world states given by the generator from ones that just look really valuable to humans but actually are not valuable at all according to the humans if they understand the plan/state well enough. So we would need some sort of ELK proposal that works to get any real comfort from the face recognizing/generating <-> human values learning analogy.
Nate Soares points out on twitter that the supposedly maximally human face like images according to GAN models look like horrible monstrosities, and so following the analogy, we should expect that for similar models doing similar things for human values, the maximally valuable world state also looks like some horrible monstrosity.
I assumed he meant the thing that most activates the face detector, but from skimming some of what people said above, seems like maybe we don’t know what that is.
Ege Erdil gave an important disanaology between the problem of recognizing/generating a human face, and the problem of either learning human values, or learning what plans that advance human values are like. The disanalogy is that humans are near perfect human face recognizers, but we are not near perfect valuable world-state or value-advancing-plan recognizers. This means that if we trained an AI to either recognize valuable world-states or value-advancing plans, we would actually end up just training something that recognizes what we can recognize as valuable states or plans. If we trained it like we train GANs, the discriminator would fail to be able to discriminate actually valuable world states given by the generator from ones that just look really valuable to humans but actually are not valuable at all according to the humans if they understand the plan/state well enough. So we would need some sort of ELK proposal that works to get any real comfort from the face recognizing/generating <-> human values learning analogy.
Nate Soares points out on twitter that the supposedly maximally human face like images according to GAN models look like horrible monstrosities, and so following the analogy, we should expect that for similar models doing similar things for human values, the maximally valuable world state also looks like some horrible monstrosity.
I’m confused, which GAN faces look like “horrible monstrosities”!?
I assumed he meant the thing that most activates the face detector, but from skimming some of what people said above, seems like maybe we don’t know what that is.