Abhinav Sharma comments on Risks from Learned Optimization: Introduction

Abhinav Sharma 2 Jun 2019 12:31 UTC
1 point
0
Because we don’t know when a neural network runs into the Mesa optimization problem we are prone to adversarial attacks ? Like the example with red doors is a neat one. There as a human programmer we thought that the algorithm is learning to read red doors but maybe all it was doing was to learn to distinguish red from everything else. Also isn’t every neural network we train today performing some sort of Mesa optimization ?