Description:
Feedforward neural networks provide the dominant model of how the brain performs
visual object recognition. However, these networks lack the lateral and feedback
connections, and the resulting recurrent neuronal dynamics, of the ventral visual pathway
in the human and non-human primate brain. Here we investigate recurrent convolutional
neural networks with bottom-up (B), lateral (L), and top-down (T) connections. Combining
these types of connections yields four architectures (B, BT, BL, and BLT), which
we systematically test and compare. We hypothesized that recurrent dynamics might
improve recognition performance in the challenging scenario of partial occlusion. We
introduce two novel occluded object recognition tasks to test the efficacy of the
models, digit clutter (where multiple target digits occlude one another) and digit debris
(where target digits are occluded by digit fragments). We find that recurrent neural
networks outperform feedforward control models (approximately matched in parametric
complexity) at recognizing objects, both in the absence of occlusion and in all occlusion
conditions. Recurrent networks were also found to be more robust to the inclusion of
additive Gaussian noise. Recurrent neural networks are better in two respects: (1) they
are more neurobiologically realistic than their feedforward counterparts; (2) they are better
in terms of their ability to recognize objects, especially under challenging conditions.
This work shows that computer vision can benefit from using recurrent convolutional
architectures and suggests that the ubiquitous recurrent connections in biological brains
are essential for task performance.