By adding random noise, I meant adding wiggles to the edge of the set in thingspace for example adding noise to “bird” might exclude “ostrich” and include “duck bill platypus”.
I agree that the high level image net concepts are bad in this sense, however are they just bad. If they were just bad and the limit to finding good concepts was data or some other resource, then we should expect small children and mentally impaired people to have similarly bad concepts. This would suggest a single gradient from better to worse. If however current neural networks used concepts substantially different from small children, and not just uniformly worse or uniformly better, that would show different sets of concepts at the same low level. This would be fairly strong evidence of multiple concepts at the smart human level.
I would also want to point out that a small fraction of the concepts being different would be enough to make alignment much harder. Even if their was a perfect scale, if 1⁄3 of the concepts are subhuman, 1⁄3 human level and 1⁄3 superhuman, it would be hard to understand the system. To get any safety, you need to get your system very close to human concepts. And you need to be confidant that you have hit this target.
By adding random noise, I meant adding wiggles to the edge of the set in thingspace for example adding noise to “bird” might exclude “ostrich” and include “duck bill platypus”.
I agree that the high level image net concepts are bad in this sense, however are they just bad. If they were just bad and the limit to finding good concepts was data or some other resource, then we should expect small children and mentally impaired people to have similarly bad concepts. This would suggest a single gradient from better to worse. If however current neural networks used concepts substantially different from small children, and not just uniformly worse or uniformly better, that would show different sets of concepts at the same low level. This would be fairly strong evidence of multiple concepts at the smart human level.
I would also want to point out that a small fraction of the concepts being different would be enough to make alignment much harder. Even if their was a perfect scale, if 1⁄3 of the concepts are subhuman, 1⁄3 human level and 1⁄3 superhuman, it would be hard to understand the system. To get any safety, you need to get your system very close to human concepts. And you need to be confidant that you have hit this target.