Self-supervised learning (SSL) enables learning useful inductive biases through utilizing pretext tasks that require no labels. The unlabeled nature of SSL makes it especially important for whole slide histopathological images (WSIs), where patch-level human annotation is difficult. Masked Autoencoders (MAE) is a recent SSL method suitable for digital pathology as it does not require negative sampling and requires little to no data augmentations. However, the domain shift between natural images and digital pathology images requires further research in designing MAE for patch-level WSIs. In this paper, we investigate several design choices for MAE in histopathology. Furthermore, we introduce a multi-modal MAE (MMAE) that leverages the specific compositionality of Hematoxylin & Eosin (H&E) stained WSIs. We performed our experiments on the public patch-level dataset NCT-CRC-HE-100K. The results show that the MMAE architecture outperforms supervised baselines and other state-of-the-art SSL techniques for an eight-class tissue phenotyping task, utilizing only 100 labeled samples for fine-tuning. Our code is available at https://github.com/wisdomikezogwo/MMAE_Pathology
Edge operators based on gray-scale morphologic operations are introduced. These operators can be efficiently implemented in near real time machine vision systems which have special hardware support for gray-scale morphologic operations. The simplest morphologic edge detectors are the dilation residue and erosion residue operators. The underlying motivation for these and some of their combinations are discussed and justified. Finally, the blur-minimum morphologic edge operator is defined. Its inherent noise sensitivity is less than the dilation or the erosion residue operators. Some experimental results are provided to show the validity of these morphologic operators. When compared with the enhancement/thresholding edge detectors and the cubic facet second derivative zero-crossing edge operator, the results show that all the edge operators have similar performance when the noise is small. However, as the noise increases, the second derivative zero-crossing edge operator and the blur-minimum morphologic edge operator have much better performance than the rest of the operators. The advantage of the blur-minimum edge operator is that it is less computationally complex than the facet edge operator.
Three-dimensional objects are now commonly used in a large number of applications including games, mechanical engineering, archaeology, culture, and even medicine. As a result, researchers have started to investigate the use of 3D shape descriptors that aim to encapsulate the important shape properties of the 3D objects. This thesis presents new 3D shape representation methodologies for quantification, classification and retrieval tasks that are flexible enough to be used in general applications, yet detailed enough to be useful in medical craniofacial dysmorphology studies. The methodologies begin by computing low-level features at each point of the 3D mesh and aggregating the features into histograms over mesh neighborhoods. Two different methodologies are defined. The first methodology begins by learning the characteristics of salient point histograms for each particular application, and represents the points in a 2D spatial map based on longitude-latitude transformation. The second methodology represents the 3D objects by using the global 2D histogram of the azimuth-elevation angles of the surface normals of the points on the 3D objects.
Four datasets, two craniofacial datasets and two general 3D object datasets, were obtained to develop and test the different shape analysis methods developed in this thesis. Each dataset has different shape characteristics that help explore the different properties of the methodologies. Experimental results on classifying the craniofacial datasets show that our methodologies achieve higher classification accuracy than medical experts and existing state-of-the-art 3D descriptors. Retrieval and classification results using the general 3D objects show that our methodologies are comparable to existing view-based and feature-based descriptors and outperform these descriptors in some cases. Our methodology can also be used to speed up the most powerful general 3D object descriptor to date.
Objective. We propose a method for retrieving similar func- tional magnetic resonance imaging (fMRI) statistical images given a query fMRI statistical image. Method. Our method thresholds the vox- els within those images and extracts spatially distinct regions from the voxels that remain. Each region is defined by a feature vector that contains the region centroid, the region area, the average activation value for all the voxels within that region, the variance of those acti- vation values, the average distance of each voxel within that region to the region's centroid, and the variance of the voxel's distance to the region's centroid. The similarity between two images is obtained by the summed minimum distance (SMD) of their constituent feature vectors. Results and conclusion. Our method is sensitive to similarities in brain activation patterns from members of the same data set. Using a subset of the features such as the centroid location and the average activation value (individually or in combination), maximized the sensitivity of our method. We also identified the similarity structure of the entire data set using those two features and the SMD.
Craniosynostosis is a serious and common pediatric disease caused by the premature fusion of sutures of the skull. Although studies have shown an increase in risk for cognitive deficits in patients with isolated craniosynostosis, the causal basis for this association is still unclear. It is hypothesized that an abnormally shaped skull produces a secondary deformation of the brain that results in the disruption of normal neuropsychological development. In this paper, we conduct a comparative analysis of our newly developed shape descriptors in an attempt to understand the impact of skull deformations on neurobehavior. In particular, we show that our scaphocephaly severity indices and symbolic shape signatures are predictive of mental ability and psychomotor functions, respectively, which suggests the possibility that secondary deformation could influence neuro-developmental status
Emotion recognition via vision has been deeply associated with facial expressions, and the inference of emotions has, more often than not, been based on the same. However, context, both environmental and social, plays an imperative role in emotion recognition but has not been incorporated widely so far. The meaning of emotion might entirely switch when shifted from one setting to another if only facial expressions are taken into account. Moreover, there exists no study in the Indian context about the same. To cater to this issue, we generate and introduce the Indian Contextual Emotion Recognition (ICER) dataset based on the multi-ethnic Indian context. This paper summarises the Contextual Emotion Learning Challenge (CELC 2021) organized in conjunction with the 16th IEEE Conference on Automatic Face and Gesture Recognition (FG) 2021. We outline the tasks posed in the challenge, the novel dataset, along with its challenges and the evaluation method. Lastly, we conclude by discussing the possible future directions.
In this work, we describe a novel symbolic representation of shapes for quantifying skull abnormalities in children with craniosynostosis. We show the efficacy of our work by demonstrating an application of this representation in shape-based retrieval of skull morphologies. This tool will enable correlation with potential pathogenesis and prognosis in order to enhance medical care.
Gaze calibration is common in traditional infrared oculographic eye tracking. However, it is not well studied in visible-light mobile/remote eye tracking. We developed a lightweight real-time gaze error estimator and analyzed calibration errors from two perspectives: facial feature-based and Monte Carlo-based. Both methods correlated with gaze estimation errors, but the Monte Carlo method associated more strongly. Facial feature associations with gaze error were interpretable, relating movements of the face to the visibility of the eye. We highlight the degradation of gaze estimation quality in a sample of children with autism spectrum disorder (as compared to typical adults), and note that calibration methods may improve Euclidean error by 10%.
We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a high-resolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.