Advances in fusion of multi-sensor inputs have necessitated the creation of more sophisticated fused image assessment techniques. The current work extends previous studies investigating participant accuracy in tracking individuals in a video sequence. Participants were shown visible and IR videos individually and the two video inputs side-by-side, as well as averaged, discrete wavelet transform, and dual- tree complex wavelet transform fused videos. Two scenarios were shown to participants: one featured a camouflaged man walking down a pathway through foliage and across a clearing; the other featured several individuals moving around the clearing. The side-by-side scanpath data were analysed by studying how often participants looked at the visible and infrared sides, and analysing how accurately participants tracked the given target, and compared with previously analysed data. The results of this study are discussed in the context of wider applications to image assessment, and the potential for modelling human scanpath performance.
Image fusion is the process of combining images of differing modalities, such as visible and infrared (IR) images. Significant work has recently been carried out comparing methods of fused image assessment, with findings strongly suggesting that a task-centred approach would be beneficial to the assessment process. The current paper reports a pilot study analysing eye movements of participants involved in four tasks. The first and second tasks involved tracking a human figure wearing camouflage clothing walking through thick undergrowth at light and dark luminance levels, whilst the third and fourth task required tracking an individual in a crowd, again at two luminance levels. Participants were shown the original visible and IR images individually, pixel-averaged, contrast pyramid, and dual-tree complex wavelet fused video sequences. They viewed each display and sequence three times to compare inter-subject scanpath variability. This paper describes the initial analysis of the eye-tracking data gathered from the pilot study. These were also compared with computational metric assessment of the image sequences
Journal Article Walter Isle: 1933–2010 Get access Terrell Dixon Terrell Dixon tfdixon@uh.edu Search for other works by this author on: Oxford Academic Google Scholar ISLE: Interdisciplinary Studies in Literature and Environment, Volume 17, Issue 2, Spring 2010, Pages 249–250, https://doi.org/10.1093/isle/isq038 Published: 23 April 2010
Image fusion is finding increasing application in areas such as medical imaging, remote sensing or military surveillance using sensor networks. Many of these applications demand highly compressed data combined with error resilient coding due to the characteristics of the communication channel. In this respect, JPEG2000 has many advantages over previous image coding standards. This paper evaluates and compares quality metrics for lossy compression using JPEG2000. Three representative image fusion algorithms: simple averaging, contrast pyramid and dual-tree complex wavelet transform based fusion have been considered. Numerous infrared and visible test images have been used. We compare these results with a psychophysical study where participants were asked to perform specific tasks and assess image fusion quality. The results show that there is a correlation between most of the metrics and the psychophysical evaluation. They also indicate that selection of the correct fusion method has more impact on performance than the presence of compression.
The prevalence of image fusion - the fusing of images of different modalities, such as visible and infrared radiation - has increased the demand for accurate methods of image quality assessment. Two traditional methods of assessment that have been used are computational metrics and subjective quality assessment; we propose an alternative task-based method of image assessment, which represents a more accurate description of image 'quality' than subjective ratings. The current study used a signal detection paradigm, identifying the presence or absence of a target in briefly presented images followed by an energy mask, which was compared with computational metric results. In Experiment 1, 18 participants were presented with composites of fused infrared and visible light images, with a soldier either present or not. There were two independent variables, each with three levels: image fusion method (averaging, contrast pyramid, dual-tree complex wavelet transform), and JPEG2000 compression (no compression, low, and high compression), in a repeated measures design. Participants were presented with images and asked to state whether or not they detected the target. In addition, metric results were calculated and compared with task performance. Images were blocked by fusion type, with compression type randomised within blocks. This process was repeated in Experiment 2, but with JPEG images substituted for JPEG2000. The results showed a significant effect for fusion but not compression in JPEG2000 images, whilst JPEG images showed significant effects for both fusion and compression. The metric results for both JPEG and JPEG2000 showed similar trends with more advanced metrics matching the performance of the psychophysical tests more accurately.
Accurate quality assessment of fused images, such as combined visible and infrared radiation images, has become increasingly important with the rise in the use of image fusion systems. We bring together three approaches, applying two objective tasks (local target analysis and global target location) to two scenarios, together with subjective quality ratings and three computational metrics. Contrast pyramid, shift-invariant discrete wavelet transform, and dual-tree complex wavelet transform fusion are applied, as well as levels of JPEG2000 compression. The differing tasks are shown to be more or less appropriate for differentiating among fusion methods, and future directions pertaining to the creation of task-specific metrics are explored.
In a virtual environment (VE), efficient techniques are often needed to economize on rendering computation without compromising the information transmitted. The reported experiments devise a functional fidelity metric by exploiting research on memory schemata. According to the proposed measure, similar information would be transmitted across synthetic and real-world scenes depicting a specific schema. This would ultimately indicate which areas in a VE could be rendered in lower quality without affecting information uptake. We examine whether computationally more expensive scenes of greater visual fidelity affect memory performance after exposure to immersive VEs, or whether they are merely more aesthetically pleasing than their diminished visual quality counterparts. Results indicate that memory schemata function in VEs similar to real-world environments. “High-level” visual cognition related to late visual processing is unaffected by ubiquitous graphics manipulations such as polygon count and depth of shadow rendering; “normal” cognition operates as long as the scenes look acceptably realistic. However, when the overall realism of the scene is greatly reduced, such as in wireframe, then visual cognition becomes abnormal. Effects that distinguish schema-consistent from schema-inconsistent objects change because the whole scene now looks incongruent. We have shown that this effect is not due to a failure of basic recognition.
The increased interest in image fusion (combining images of two or more modalities such as infrared and visible light radiation) has led to a need for accurate and reliable image assessment methods. Previous work has often relied upon subjective quality ratings combined with some form of computational metric analysis. However, we have shown in previous work that such methods do not correlate well with how people perform in actual tasks utilising fused images. The current study presents the novel use of an eye-tracking paradigm to record how accurately participants could track an individual in various fused video displays. Participants were asked to track a man in camouflage outfit in various input videos (visible and infrared originals, a fused average of the inputs; and two different wavelet-based fused videos) whilst also carrying out a secondary button-press task. The results were analysed in two ways, once calculating accuracy across the whole video, and by dividing the video into three time sections based on video content. Although the pattern of results depends on the analysis, the accuracy for the inputs was generally found to be significantly worse than that for the fused displays. In conclusion, both approaches have good potential as new fused video assessment methods, depending on what task is carried out.
This paper investigates how the object tracking performance is affected by the fusion quality of videos from visible (VIZ) and infrared (IR) surveillance cameras, as compared to tracking in single modality videos. The videos have been fused using the simple averaging, and various multiresolution techniques. Tracking has been accomplished by means of a particle filter using colour and edge cues. The highest tracking accuracy has been obtained in IR sequences, whereas the VIZ video was affected by many artifacts and showed the worst tracking performance. Among the fused videos, the complex wavelet and the averaging techniques, offered the best tracking performance, comparable to that of IR. Thus, of all the methods investigated, the fused videos, containing complementary contextual information from both single modality input videos, are the best source for further analysis by a human observer or a computer program.