Face recognition benefits from associating conceptual-social information to faces during learning. For example, making trait-inferences, relative to perceptual-evaluations, during face learning improves face recognition. Two hypotheses were proposed to account for this conceptual-social benefit in face recognition. According to the feature elaboration hypothesis, social evaluations encourage elaborated processing of perceptual information for faces. According to a conceptual-social hypothesis, social evaluations convert faces from a perceptual image-based representation to a socially meaningful representation of a person. To test these hypotheses, we ran a functional MRI study in which we functionally localized the occipital-temporal face areas (i.e., perceptual face network) as well as the social brain network (e.g., dmPFC, vmPFC, PCC, TPJ). Prior to scanning, participants watched video clips depicting a social interaction between young adults and were asked to study them for a memory test while making either perceptual evaluations (e.g., how round/symmetric is the face?) or conceptual-social evaluations (e.g., how trustworthy/intelligent does the face look?) about them. During the fMRI scan, participants performed an old/new recognition test on the faces that were presented during the learning phase in the video clips and novel faces. Behavioral findings replicated the conceptual-social benefit in face recognition. Functional MRI results showed higher fMRI signal during recognition for the faces that were evaluated conceptually than perceptually during learning, in the social network areas but not in ventral-occipital face areas. These results support the conceptual-social hypothesis indicating that the conceptual benefit of face recognition is mediated by social rather than perceptual mechanisms.
Summary A hallmark of high-level visual cortex is its functional organization of neighboring clusters of neurons that are selective to single categories such as faces, bodies and objects. However, visual scenes are typically composed of multiple categories. How does category-selective cortex represent such complex stimuli? According to a normalization mechanism, the response of a single neuron to multiple stimuli is normalized by the response of its neighboring neurons (normalization pool). Here we show that category-selectivity, measured with fMRI, can provide an estimate for the heterogeneity of the normalization pool, which determines the response to multiple stimuli. These results provide a general framework for the varying representations of multiple stimuli that were reported in different regions of category-selective cortex in neuroimaging and single-unit recording studies. This type of organization may enable a dynamic and flexible representation of complex visual scenes that can be modulated by higher-level cognitive systems according to task demands.
Faces convey rich information including identity, gender and expression. Current neural models of face processing suggest a dissociation between the processing of invariant facial aspects such as identity and gender, that engage the fusiform face area (FFA) and the processing of changeable aspects, such as expression and eye gaze, that engage the posterior superior temporal sulcus face area (pSTS-FA). Recent studies report a second dissociation within this network such that the pSTS-FA, but not the FFA, shows much stronger response to dynamic than static faces. The aim of the current study was to test a unified model that accounts for these two functional characteristics of the neural face network. In an fMRI experiment, we presented static and dynamic faces while subjects judged an invariant (gender) or a changeable facial aspect (expression). We found that the pSTS-FA was more engaged in processing dynamic than static faces and changeable than invariant aspects, whereas the OFA and FFA showed similar response across all four conditions. These findings support an integrated neural model of face processing in which the ventral areas extract form information from both invariant and changeable facial aspects whereas the dorsal face areas are sensitive to dynamic and changeable facial aspects.
The majority of studies on person recognition have examined the processing of static faces. The few studies which have examined person recognition from dynamic videos of the whole person (e.g. O'Toole et al., 2011) have done so within the same media – examining person recognition from videos after exposure to dynamic displays vs. person recognition from still images after exposure to still images alone. In this study we examined the contribution of previous exposure to motion information to whole person recognition from still images, where no dynamic information is available. To this end used a matching task in which we presented either videos of a person walking or multiple still images from the same videos and asked participants to recognize the identities shown from novel still images of either the full body, face or body alone. We found that after exposure to videos the body contributed to person recognition beyond the face; however when no dynamic information was available, person recognition from the full body was no better than person recognition from the face alone. Furthermore, we found that when less facial information was available in the videos, the body contributed more to person recognition. Finally, since person recognition from images of the body alone proved to be at chance in these experiments, we demonstrated that the inclusion of a non-informative head context alongside body only images improved person recognition, thereby suggesting that full body context is important for person recognition. Overall, these findings indicate that exposure to people in motion enhances person recognition from still images beyond person recognition based on the face alone. This suggests that body motion improves the representation of body form, thereby making the body more informative to person recognition even from still images. Meeting abstract presented at VSS 2015
Face recognition is a challenging categorization task, as in many cases the variability between different images of the same identity may be larger than the variability between images of different identities. Nevertheless, humans excel in this task, in particular for faces they are familiar with. What type of learning and what is the nature of the representation of the learned identity that support such remarkable categorization ability? Here we propose that conceptual learning and the generation of a conceptual representation of the learned identity in memory enables this classification performance. First, we show that humans learn to link perceptually different faces to the same identity, if faces are learned with the same conceptual information. Next, we show that this conceptual learning does not generate a single perceptual representation of the different appearances of each identity. Instead, perceptually dissimilar images of the same identity remain separated in the perceptual space and are linked conceptually rather than perceptually. This conceptual representation of face identity is advantageous, as it enables generalization across perceptually dissimilar images of the same identity/category, without increasing false recognition of perceptually similar images of different identities. A similar conceptual mechanism may also apply to other familiar categories such as familiar voices or objects of expertise that involve fine discrimination of a homogenous sets of stimuli that are linked to unique conceptual information. Overall these findings highlight the importance of studying the contribution of both cognition and perception to face recognition.
Person recognition has been primarily studied with static images of faces. However, in real life we typically see the whole person in motion. This dynamic exposure provides rich information about a person's face and body shape as well as their body motion. What is the relative contribution of the face, body and motion to person recognition? In a series of studies, we examined the conditions under which the body and motion contribute to person recognition beyond the face. In these studies, participants were presented with short videos of people walking towards the camera and were asked to recognize them from a still image or a video that was taken on a different day (so recognition was not based on clothing or external facial features). Our findings show that person recognition relies primarily on the face, when facial information is clear and available. However, when facial information is unclear or at a distance the body contributes to person recognition beyond the face. Furthermore, although person recognition based on the body alone is very poor, the body can be used for person recognition when presented in whole person context and in motion. In particular, person recognition from uninformative faceless heads attached to headless bodies was better than recognition from the body alone. Additionally, person recognition from dynamic headless bodies was better than recognition from multiple static images taken from the video. Overall our results show that when facial information is clearly available, person recognition is primarily based on the face. When facial information is degraded, body, motion and the context of the whole person are used for person recognition. Thus, even though the face is the primary source of information for person identity, information from the body contributes to person recognition in particular in the context of the whole person in motion. Meeting abstract presented at VSS 2017
We report here an unexpectedly robust ability of healthy human individuals ( n = 40) to recognize extremely distorted needle-like facial images, challenging the well-entrenched notion that veridical spatial configuration is necessary for extracting facial identity. In face identification tasks of parametrically compressed internal and external features, we found that the sum of performances on each cue falls significantly short of performance on full faces, despite the equal visual information available from both measures (with full faces essentially being a superposition of internal and external features). We hypothesize that this large deficit stems from the use of positional information about how the internal features are positioned relative to the external features. To test this, we systematically changed the relations between internal and external features and found preferential encoding of vertical but not horizontal spatial relationships in facial representations ( n = 20). Finally, we employ magnetoencephalography imaging ( n = 20) to demonstrate a close mapping between the behavioral psychometric curve and the amplitude of the M250 face familiarity, but not M170 face-sensitive evoked response field component, providing evidence that the M250 can be modulated by faces that are perceptually identifiable, irrespective of extreme distortions to the face's veridical configuration. We theorize that the tolerance to compressive distortions has evolved from the need to recognize faces across varying viewpoints. Our findings help clarify the important, but poorly defined, concept of facial configuration and also enable an association between behavioral performance and previously reported neural correlates of face perception.
Abstract Perceptual expertise is an acquired skill that enables fine discrimination of members of a homogenous category. The question of whether perceptual expertise is mediated by general-expert or domain-specific processing mechanisms has been hotly debated for decades in human behavioral and neuroimaging studies. To decide between these two hypotheses, most studies examined whether expertise for different domains is mediated by the same mechanisms used for faces, for which most humans are expert. Here we used deep convolutional neural networks (DCNNs) to test whether perceptual expertise is best achieved by computations that are optimized for face or object classification. We re-trained a face-trained and an object-trained DCNNs to classify birds at the sub-ordinate or individual-level of categorization. The face-trained DCNN required deeper retraining to achieve the same level of performance for bird classification as an object-trained DCNN. These findings indicate that classification at the subordinate- or individual-level of categorization does not transfer well between domains. Thus, fine-grained classification is best achieved by using domain-specific rather than domain-general computations.
Current models of face recognition are primarily concerned with the role of perceptual experience and the nature of the perceptual representation that enables face identification. These models overlook the main goal of the face recognition system, which is to recognize socially relevant faces. We therefore propose a new account of face recognition according to which faces are learned from concepts to percepts. This account highlights the critical contribution of the conceptual and social information that is associated with faces to face recognition. Our recent studies show that conceptual/social information contributes to face recognition in two ways: First, faces that are learned in social context are better recognized than faces that are learned based on their perceptual appearance. These findings indicate the importance of converting faces from a perceptual to a social representation for face recognition. Second, we found that conceptual information significantly accounts for the visual representation of faces in memory, but not in perception. This was the case both based on human perceptual and conceptual similarity ratings as well as the representations that are generated by unimodal deep neural networks that represent faces based on visual information alone, and multi-model networks that represent visual and conceptual information about faces. Taken together, we propose that the representation that is generated for faces by the perceptual and memory systems is determined by social/conceptual factors, rather than our passive perceptual experience with faces per se.
Five hypotheses have been proposed to account for the information-processing deficit that causes prosopagnosia. To support one of these explanations, it is necessary to rule out all of the alternative hypotheses. Last year, we presented the results of testing with Edward, a developmental prosopagnosic, who performed poorly on all face tasks, but normally with five of six tests of non-face discrimination. Those results ruled out the within-class hypothesis. Since then we have run him with tests addressing the configural processing hypothesis, the non-decomposable hypothesis, and the curvature hypothesis. Edward performed normally with all of these tests so his results are inconsistent with these explanations. The only remaining hypothesis, the face-specific hypothesis, proposes that prosopagnosia results from an impairment to the holistic processes which operate on upright faces. To test this explanation, we tested Edward in a same-different discrimination test with faces differing either in the spacing of the features or the features themselves. Consistent with the face-specific hypothesis, Edward was out of the normal range with the spacing items but normal with the feature items. Taken together, these results indicate that normal adults have mechanisms specialized for face-specific configural processing.