Describes a CAD-model-based machine vision system for dimensional inspection of machined parts, with emphasis on the theory behind the system. The original contributions of the work are: the use of precise definitions of geometric tolerances suitable for use in image processing, the development of measurement algorithms corresponding directly to these definitions; the derivation of the uncertainties in the measurement tasks; and the use of this uncertainty information in the decision-making process. Experimental results have verified the uncertainty derivations statistically and proved that the error probabilities obtained by propagating uncertainties are lower than those obtainable without uncertainty propagation.< >
Edge grouping and object perception are unified procedures in perceptual organization. However the computer vision literature classifies them as independent tasks. In this paper, we argue that edge detection and object proposals should benefit one another. To achieve this, we go beyond bounding boxes and extract closed contours that represent potential objects within. A novel objectness metric is proposed to score and rank the proposal boxes by considering the sizes and edge intensities of the closed contours. To improve the edge detector given the top-down object proposals, we group local closed contours and construct global object hierarchies and segmentations. The edge detector is retrained and enhanced using these hierarchical segmentations as additional feature channels. In the experiments we show that by closing the loop for edge detection and object proposals, we observe improvements for both tasks. Unifying edges and object proposals is valid and useful.
In this paper we describe an ontological scheme for representing anatomical entities undergoing morphological transformation and changes in phenotype during prenatal development. This is a proposed component of the Anatomical Transformation Abstraction (ATA) of the Foundational Model of Anatomy (FMA) Ontology that was created to provide an ontological framework for capturing knowledge about human development from the zygote to postnatal life. It is designed to initially describe the structural properties of the anatomical entities that participate in human development and then enhance their description with developmental properties, such as temporal attributes and developmental processes. This approach facilitates the correlation and integration of the classical but static representation of embryology with the evolving novel concepts of developmental biology, which primarily deals with the experimental data on the mechanisms of embryogenesis and organogenesis. This is important for describing and understanding the underlying processes involved in structural malformations. In this study we focused on the development of the lips and the palate in conjunction with our work on the pathogenesis and classification of cleft lip and palate (CL/P) in the FaceBase program. Our aim here is to create the Craniofacial Human Development Ontology (CHDO) to support the Ontology of Craniofacial Development and Malformation (OCDM), which provides the infrastructure for integrating multiple and disparate craniofacial data generated by FaceBase researchers.
Current work in object categorization discriminates among objects that typically possess gross differences which are readily apparent. However, many applications require making much finer distinctions. We address an insect categorization problem that is so challenging that even trained human experts cannot readily categorize images of insects considered in this paper. The state of the art that uses visual dictionaries, when applied to this problem, yields mediocre results (16.1% error). Three possible explanations for this are (a) the dictionaries are unsupervised, (b) the dictionaries lose the detailed information contained in each keypoint, and (c) these methods rely on hand-engineered decisions about dictionary size. This paper presents a novel, dictionary-free methodology. A random forest of trees is first trained to predict the class of an image based on individual keypoint descriptors. A unique aspect of these trees is that they do not make decisions but instead merely record evidence-i.e., the number of descriptors from training examples of each category that reached each leaf of the tree. We provide a mathematical model showing that voting evidence is better than voting decisions. To categorize a new image, descriptors for all detected keypoints are "dropped" through the trees, and the evidence at each leaf is summed to obtain an overall evidence vector. This is then sent to a second-level classifier to make the categorization decision. We achieve excellent performance (6.4% error) on the 9-class STONEFLY9 data set. Also, our method achieves an average AUC of 0.921 on the PASCAL06 VOC, which places it fifth out of 21 methods reported in the literature and demonstrates that the method also works well for generic object categorization.
The affine-transformation matching scheme proposed by Hummel and Wolfson (1988) is very efficient in a model-based matching system, not only in terms of the computational complexity involved, but also in terms of the simplicity of the method. This paper addresses the implementation of the affine-invariant point matching, applied to the problem of recognizing and determining the pose of sheet metal parts. It points out errors that can occur with this method due to quantization, stability, symmetry, and noise problems. By beginning with an explicit noise model which the Hummel and Wolfson technique lacks, we can derive an optimal approach which overcomes these problems. We show that results obtained with the new algorithm are clearly better than the results from the original method.
We present an approach for identifying the most walkable direction for navigation using a hand-held camera. Our approach extracts semantically rich contextual information from the scene using a custom encoder-decoder architecture for semantic segmentation and models the spatial and temporal behavior of objects in the scene using a spatio-temporal graph. The system learns to minimize a cost function over the spatial and temporal object attributes to identify the most walkable direction. We construct a new annotated navigation dataset collected using a hand-held mobile camera in an unconstrained outdoor environment, which includes challenging settings such as highly dynamic scenes, occlusion between objects, and distortions. Our system achieves an accuracy of 84% on predicting a safe direction. We also show that our custom segmentation network is both fast and accurate, achieving mIOU (mean intersection over union) scores of 81 and 44.7 on the PASCAL VOC and the PASCAL Context datasets, respectively, while running at about 21 frames per second.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines.
Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has slowed similar progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 802,148 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models), handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets, from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new pathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.
We define a foundational model as an abstraction of a body of knowledge that explicitly declares the principles and concepts necessary for coherently and consistently modelling a knowledge domain. Principles for a foundational model of anatomy are defined and used to specify the components of such a model. These components include an anatomy ontology (Ao), an anatomical structural abstraction (ASA), an anatomical transformation abstraction (ATA) and metaknowledge (Mk), which comprises the rules for representing relationships in the other three components of the model. The foundational model Fm is therefore specified as the four-tuple Fm = (Ao,ASA,ATA,Mk). We hypothesize that this abstraction captures the information that is sufficient and necessary for describing the anatomy of any physical entity that constitutes the body, as well as that of the body itself.