interstice comments on Generalising CNNs

interstice 3 Jan 2019 3:49 UTC
3 points
You might be interested in Transformer Networks, which use a learned pattern of attention to route data between layers. They’re pretty popular and have been used in some impressive applications like this very convincing image-synthesis GAN.
re: whether this is a good research direction. The fact that neural networks are highly compressible is very interesting and I too suspect that exploiting this fact could lead to more powerful models. However, if your goal is to increase the chance that AI has a positive impact, then it seems like the relevant thing is how quickly our understanding of how to align AI systems progresses, relative to our understanding of how to build powerful AI systems. As described, this idea sounds like it would be more useful for the latter.
- James Crook 3 Jan 2019 17:05 UTC
  1 point
  Parent
  The image synthesis is impressive. The Transformer network paper looks intriguing. I will need to read again much more slowly, and not skim to understand it. Thanks for both the links and feedback on aligning AI.
  I agree the ideas really are about progressing AI, rather than progressing AI specifically in a positive way. As a post-hoc justification though, exploring attention mechanisms in machine learning indicates that what AI ‘cares about’ may be pretty deeply embedded in its technology. Your comment, and my need to justify post-hoc, set me to the task of making that link more concrete, so let me expand on that.
  I think many animals have almost hard-wired attention mechanisms for alerting them to eyes. Things with eyes are animate, and may need a reaction more rapidly than rocks or trees do. Animals do have almost hard-wired attention mechanisms for sudden movement too.
  What alerting or attention setting mechanisms will AIs for self-driving cars have? Probably they will prioritise sudden movement detection. Probably they won’t have any specific mechanism for alerting to eyes. Perhaps that’s a mistake.
  I’ve noticed that the bounding boxes in some videos of ‘what a car sees’ are pretty good for following vehicles, but flick on and off for bounding boxes around people on the sidewalk. The stable bounding boxes are relatively square. The unstable bounding boxes are tall and thin.
  Now just maybe, we want to make a visual architecture that is very good at distinguishing tall thin objects that could be people, from tall thin objects that could be lamp posts. That has implications all the way down to the visual pipeline. The car is not going to be good at solving trolley problems if it can tell trucks from cars, but can’t tell people from lamp posts.