Basic question: why would the AI system optimize for X-ness?
I thought Katja’s argument was something like:
Suppose we train a system to generate (say) plans for increasing the profits of your paperclip factory similar to how we train GANs to generate faces
Then we would expect those paperclip factory planners to have analogous errors to face generator errors
I.e. they will not be “eldritch”
The fact that you could repurpose the GAN discriminator in this terrifying way doesn’t really seem relevant if no one is in practice doing that?
Basic question: why would the AI system optimize for X-ness?
I thought Katja’s argument was something like:
Suppose we train a system to generate (say) plans for increasing the profits of your paperclip factory similar to how we train GANs to generate faces
Then we would expect those paperclip factory planners to have analogous errors to face generator errors
I.e. they will not be “eldritch”
The fact that you could repurpose the GAN discriminator in this terrifying way doesn’t really seem relevant if no one is in practice doing that?