For decades, scientists have struggled with the problem of object recognition. Human vision allows for the rapid recognition of objects even when presented from different viewpoints, requiring a representation that is both detailed and computationally efficient. The medial axis is one possibility. Recently, when subjects were tasked with spontaneously tapping once within a shape, the aggregated taps revealed evidence of a medial axis for various shapes (Firestone & Scholl, 2014). Nevertheless, it is unclear whether the medial axis representation is recruited for other processes such as object memory. Here we examined whether bias to the medial axis is modulated by spatial memory demands. Borrowing from a classic paradigm (Huttenlocher, Hedges, & Duncan, 1991), we asked participants to memorize the locations of either 1 or 20 dots within a rectangle and, once the dots had disappeared, to "place" those dots in their original locations. When only one dot was present, participants showed no bias in relation to the medial axis (p > .30) but were, instead, biased towards the "prototypes" of the rectangle's quadrants (p < .001). In contrast, with 20 dots, participants were significantly biased towards the medial axis over and above any bias towards other models (e.g., principal axis [p < .01], or center of rectangle [p < .001]). These findings suggest that humans have access to multiple shape representations that are task dependent. When the task requires recalling multiple locations, participants rely on a representation that captures the medial axis. However, when the task requires recalling a single location, subjects appear to maximize the precision of their estimates by segmenting space into smaller units (i.e., quadrants). We suggest that whereas a medial axis representation may be ideal for capturing a spatial "gist", other representations (i.e., segmentation with prototypes) are better suited to precise localization within a shape. Meeting abstract presented at VSS 2016
Decades of research in the cognitive and neural sciences have shown that shape perception is crucial for object recognition. However, it remains unknown how object shape is represented to accomplish recognition. Here we used behavioral and neural techniques to test whether human object representations are well described by a model of shape based on an object’s skeleton when compared with other computational descriptors of visual similarity. Skeletal representations may be an ideal model for object recognition because they (1) provide a compact description of a shape’s structure by describing the relations between contours and component parts, and (2) provide a metric by which to compare the visual similarity between shapes. In a first experiment, we tested whether a model of skeletal similarity was predictive of human behavioral similarity judgments for novel objects. We found that the skeletal model explained the greatest amount of unique variance in participants’ judgments (33.13%) when compared with other models of visual similarity (Gabor-jet, GIST, HMAX, AlexNet), suggesting that skeletal descriptions uniquely contribute to object recognition. In a second experiment, we used fMRI and representational similarity analyses to examine whether object-selective regions (LO, pFs), or even early-visual regions, code for an object’s skeleton. We found that skeletal similarity explained the greatest amount of unique variance in LO (19.32%) and V3 (18.74%) in the right hemisphere (rLO; rV3), but not in other regions. That a skeletal description was most predictive of rLO is consistent with its role in specifying object shape via the relations between components parts. Moreover, our findings may shed new light on the functional role of V3 in using skeletons to integrate contours into complete shapes. Together, our results highlight the importance of skeletal descriptors for human object recognition and the computation of shape in the visual system.
Although researchers agree that number and other magnitudes are represented in analog format, there is disagreement about whether these representations form part of an integrated system, the so-called 'general magnitude system' (Walsh, 2003). Here we used a subliminal priming paradigm to test for interactions between different magnitudes (number and area) when one magnitude (number) was not consciously detectable. On each trial, participants were presented with a pair of black and white Arabic digits as subliminal primes (e.g., white 4 and black 8). Each digit pair was presented for a short duration (43 ms) and sandwiched between two masks, preventing conscious detection. Participants were then presented with target displays of black and white two-dimensional shapes (lasting 200 ms), and tasked with judging which array was larger in cumulative surface area (Experiment 1). We found significant congruity effects. That is, participants were both more accurate and faster on trials in which the mapping between color and relative number for the Arabic digits matched the mapping between color and relative surface area in the non-symbolic arrays (e.g., a prime display with a white 4 and a black 8 followed by a target display with smaller white surface area and larger black surface area) than when there was a mismatch (white 4 and black 8 followed by large white area and small black area; ps < .01). These findings suggest direct connections, or overlap, between representations of number and area, and because the primes were subliminal, mediation by common verbal labels was not a viable alternative explanation. Moreover, in a subsequent experiment (Experiment 2), we ruled out an alternative account that would explain congruity in terms of post-representational (i.e., decision) effects. Taken together, these experiments provide unique support for a general magnitude system that integrates numerical and non-numerical magnitudes Meeting abstract presented at VSS 2016
Abstract Despite their anatomical and functional distinctions, there is growing evidence that the dorsal and ventral visual pathways interact to support object recognition. However, the exact nature of these interactions remains poorly understood. Is the presence of identity-relevant object information in the dorsal pathway simply a byproduct of ventral input? Or, might the dorsal pathway be a source of input to the ventral pathway for object recognition? In the current study, we used high-density EEG—a technique with high temporal precision and spatial resolution sufficient to distinguish parietal and temporal lobes—to characterise the dynamics of dorsal and ventral pathways during object viewing. Using multivariate analyses, we found that category decoding in the dorsal pathway preceded that in the ventral pathway. Importantly, the dorsal pathway predicted the multivariate responses of the ventral pathway in a time-dependent manner, rather than the other way around. Together, these findings suggest that the dorsal pathway is a critical source of input to the ventral pathway for object recognition.
Abstract Shape perception is crucial for object recognition. However, it remains unknown exactly how shape information is represented, and, consequently, used by the visual system. Here, we hypothesized that the visual system represents “shape skeletons” to both (1) perceptually organize contours and component parts into a shape percept, and (2) compare shapes to recognize objects. Using functional magnetic resonance imaging (fMRI) and representational similarity analysis (RSA), we found that a model of skeletal similarity explained significant unique variance in the response profiles of V3 and LO, regions known to be involved in perceptual organization and object recognition, respectively. Moreover, the skeletal model remained predictive in these regions even when controlling for other models of visual similarity that approximate low- to high-level visual features (i.e., Gabor-jet, GIST, HMAX, and AlexNet), and across different surface forms, a manipulation that altered object contours while preserving the underlying skeleton. Together, these findings shed light on the functional roles of shape skeletons in human vision, as well as the computational properties of V3 and LO.
Unlike artificial neural networks (ANNs), human object recognition is robust in degraded conditions. Accumulating evidence suggests that recurrent connections within the ventral visual stream are necessary in such conditions. Indeed, incorporating recurrence within ANNs significantly improves their performance. Nevertheless, despite the success of recurrent ANNs over purely feedforward ones, the recognition abilities of these models on degraded objects lags far behind that of human adults. Why is the human visual system impervious to such conditions? In a novel approach to answering this question, we compared the recognition abilities of state-of-the-art ANNs to 4- and 5-year-old children. Although children show impressive object recognition abilities, it remains unknown how robust these abilities are when objects are degraded or under speeded conditions. Children (N = 84) were tested on a challenging object recognition task which required them to identify rapidly presented object outlines (100 ms - 300 ms; forward and backward masked) that had perturbed or illusory contours. We found that even the youngest children successfully identified both perturbed and illusory outlines at the fastest speeds, even though objects were both forward and backward masked. By contrast, neither a feedforward model (VGG19), nor a model that approximates recurrence (ResNet101), showed comparable performance to children. Thus, despite receiving exponentially more supervised object training than children (Zador, 2019), ANNs fall short of the recognition abilities of children. We suggest that, from early in development, robust object recognition in humans may be supported by parallel feedforward processes in the dorsal stream, in addition to recurrent processes in ventral stream.
The algorithm with which humans represent objects must be durable enough to support object recognition across changes in orientation and partial occlusions. This algorithm must also be flexible enough to include both internal components of an object and its global shape. The current study examined viable models of object representation. In a first experiment, we tested the medial axis model (i.e., shape skeleton; Blum, 1973) against a principal axis model (Marr & Nishihara, 1978) with three shapes (rectangle, square, and T) using the "tap" paradigm by Firestone and Scholl (2014) where participants were instructed to tap once within a shape anywhere they choose. We collected 200 taps per shape and found that responses were significantly closer to the medial axis than either randomly-determined points (best set of 50,000 simulations; ps < .001) or points corresponding to the major principal axis (ps < .001). Having found evidence for the medial axis model, in a second experiment we tested whether an internal protrusion of varying size affected participants' the medial axis representation of a rectangle. Participants tapped within a rectangle that contained either a large or small visible obstacle within it. We found that in both cases, participants' taps conformed to the medial axis of the shape (p < .001); that is, taps accommodated to the obstacle within the rectangle. Taken together, these results provide evidence for a robust medial axis representation that is both evident for different shapes and one that flexibly accommodates to even slight protrusions within a shape. Meeting abstract presented at VSS 2016