Kaj_Sotala comments on [link] New essay summarizing some of my latest thoughts on AI safety

Kaj_Sotala 5 Nov 2015 19:58 UTC
1 point

I think I’ve yet to see a paper that convincingly supports the claim that neural nets are learning natural representations of the world

Taboo natural representations?
- jsteinhardt 10 Nov 2015 4:05 UTC
  2 points
  Parent
  Without defining a natural representation (since I don’t know how to), here’s 4 properties that I think a representation should satisfy before it’s called natural (I also give these in my response to Vika):
  
  (1) Good performance on different data sets in the same domain.
  
  (2) Good transference to novel domains.
  
  (3) Robustness to visually imperceptible perturbations to the input image.
  
  (4) “Canonicality”: replacing the learned features with a random invertible linear transformation of the learned features should degrade performance.
  - Kaj_Sotala 12 Nov 2015 13:33 UTC
    2 points
    Parent
    Thanks.
    
    So to clarify, my claim was not that we’d yet have algorithms producing representations that would fulfill all of these criteria. But it would seem to me that something like word embeddings would be moving towards the direction of fulfilling these. E.g. something like this bit from the linked post:
    
    Recently, deep learning has begun exploring models that embed images and words in a single representation.
    
    The basic idea is that one classifies images by outputting a vector in a word embedding. Images of dogs are mapped near the “dog” word vector. Images of horses are mapped near the “horse” vector. Images of automobiles near the “automobile” vector. And so on.
    
    The interesting part is what happens when you test the model on new classes of images. For example, if the model wasn’t trained to classify cats – that is, to map them near the “cat” vector – what happens when we try to classify images of cats?
    
    It turns out that the network is able to handle these new classes of images quite reasonably. Images of cats aren’t mapped to random points in the word embedding space. Instead, they tend to be mapped to the general vicinity of the “dog” vector, and, in fact, close to the “cat” vector. Similarly, the truck images end up relatively close to the “truck” vector, which is near the related “automobile” vector.
    
    This was done by members of the Stanford group with only 8 known classes (and 2 unknown classes). The results are already quite impressive. But with so few known classes, there are very few points to interpolate the relationship between images and semantic space off of.
    
    The Google group did a much larger version – instead of 8 categories, they used 1,000 – around the same time (Frome et al. (2013)) and has followed up with a new variation (Norouzi et al. (2014)). Both are based on a very powerful image classification model (from Krizehvsky et al. (2012)), but embed images into the word embedding space in different ways.
    
    The results are impressive. While they may not get images of unknown classes to the precise vector representing that class, they are able to get to the right neighborhood. So, if you ask it to classify images of unknown classes and the classes are fairly different, it can distinguish between the different classes.
    
    Even though I’ve never seen a Aesculapian snake or an Armadillo before, if you show me a picture of one and a picture of the other, I can tell you which is which because I have a general idea of what sort of animal is associated with each word. These networks can accomplish the same thing.
    
    sounds to me like it would be represent clear progress towards at least #1 and #2 of your criteria.
    
    I agree that the papers on adversarial examples that you cited earlier are evidence that many current models are still not capable of meeting criteria #3, but on the other hand the second paper does seem to present clear signs that the reasons for the pathologies are being uncovered and addressed, and that future algorithms will be able to avoid this class of pathology. (Caveat: I do not yet fully understand those papers, so may be interpreting them incorrectly.)