Current models of eye movement control are derived from theories assuming serial processing of single items or from theories based on parallel processing of multiple items at a time. This issue has persisted because most investigated paradigms generated data compatible with both serial and parallel models. Here, we study eye movements in a sequential scanning task, where stimulus n indicates the position of the next stimulus n + 1. We investigate whether eye movements are controlled by sequential attention shifts when the task requires serial order of processing. Our measures of distributed processing in the form of parafoveal-on-foveal effects, long-range modulations of target selection, and skipping saccades provide evidence against models strictly based on serial attention shifts. We conclude that our results lend support to parallel processing as a strategy for eye movement control.
Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.
Visual long-term memory (VLTM) of complex visual stimuli such as photographs of scenes has previously been shown to significantly affect eye movements. This effect manifests as a decrease in saccade amplitude and increase in fixation duration as images are shown repeatedly within one experimental session. VLTM is known to have a much longer temporal persistence, however. In a series of two experiments we investigate the transfer of the effects of VLTM on eye movements to longer time scales. The first experiment was comprised three sessions spread over several days. In each session participants viewed and memorized a sequence of images. In Session 1 all presented images were unfamiliar to participants. Sessions 2 and 3 included (a) images familiar from Session 1, (b) semantically and structurally similar images, and (c) unfamiliar images. Participants showed the expected proficiency in recognizing images even days after exposure. However, using a mixed linear model approach, we found no evidence of image familiarity on eye movement measures like fixation duration and saccade amplitude. The effect on target selection, as quantified by the likelihood of fixation locations given the empirical distribution of fixation locations, was weakly significant. In a second experiment we reduced the time between sessions by conducting Session 1 and 2 on the same day, and Session 3 on the following day. As in the previous study, the results showed that the influence of VLTM on eye movements is weaker than when presentations occur within the same session. These results are compatible with the view that, while memory of an image remains intact over days, the immediate effect of VLTM on eye movement metrics decays much faster. Only hours after memorizing an image, scene exploration is primarily driven by the current visual input and independent of VLTM.
Real-world scene perception is typically studied in the laboratory using static picture viewing with restrained head position. Consequently, the transfer of results obtained in this paradigm to real-word scenarios has been questioned. The advancement of mobile eye-trackers and the progress in image processing, however, permit a more natural experimental setup that, at the same time, maintains the high experimental control from the standard laboratory setting. We investigated eye movements while participants were standing in front of a projector screen and explored images under four specific task instructions. Eye movements were recorded with a mobile eye-tracking device and raw gaze data was transformed from head-centered into image-centered coordinates. We observed differences between tasks in temporal and spatial eye-movement parameters and found that the bias to fixate images near the center differed between tasks. Our results demonstrate that current mobile eye-tracking technology and a highly controlled design support the study of fine-scaled task dependencies in an experimental setting that permits more natural viewing behavior than the static picture viewing paradigm.
Bottom-up and top-down, as well as low-level and high-level factors influence where we fixate when viewing natural scenes. However, the importance of each of these factors and how they interact remains a matter of debate. Here, we disentangle these factors by analysing their influence over time. For this purpose we develop a saliency model which is based on the internal representation of a recent early spatial vision model to measure the low-level bottom-up factor. To measure the influence of high-level bottom-up features, we use a recent DNN-based saliency model. To account for top-down influences, we evaluate the models on two large datasets with different tasks: first, a memorisation task and, second, a search task. Our results lend support to a separation of visual scene exploration into three phases: The first saccade, an initial guided exploration characterised by a gradual broadening of the fixation density, and an steady state which is reached after roughly 10 fixations. Saccade target selection during the initial exploration and in the steady state are related to similar areas of interest, which are better predicted when including high-level features. In the search dataset, fixation locations are determined predominantly by top-down processes. In contrast, the first fixation follows a different fixation density and contains a strong central fixation bias. Nonetheless, first fixations are guided strongly by image properties and as early as 200 ms after image onset, fixations are better predicted by high-level information. We conclude that any low-level bottom-up factors are mainly limited to the generation of the first saccade. All saccades are better explained when high-level features are considered, and later this high-level bottom-up control can be overruled by top-down influences.
Eye movements depend on cognitive processes related to visual information processing. Much has been learned about the spatial selection of fixation locations, while the principles governing the temporal control (fixation durations) are less clear. Here we review current theories for the control of fixation durations in tasks like visual search, scanning, scene perception, and reading and propose a new model for the control of fixation durations. We distinguish two local principles from one global principle of control. First, an autonomous saccade timer initiates saccades after random time intervals (Local-I). Second, foveal inhibition permits immediate prolongation of fixation durations by ongoing processing (Local-II). Third, saccade timing is adaptive, so that the mean timer value depends on task requirements and fixation history (Global). We demonstrate by numerical simulations that our model qualitatively reproduces patterns of mean fixation durations and fixation duration distributions observed in typical experiments. When combined with assumptions of saccade-target selection and oculomotor control, the model accounts for both temporal and spatial aspects of eye-movement control in two versions of a visual search task. We conclude that the model provides a promising framework for the control of fixation durations in saccadic tasks. Psychonomic Bulletin & Review
During scene perception our eyes generate complex sequences of fixations. Predictors of fixation locations are bottom-up factors like luminance contrast, top-down factors like viewing instruction, and systematic biases like the tendency to place fixations near the center of an image. However, comparatively little is known about the dynamics of scanpaths after experimental manipulation of specific fixation locations. Here we investigate the influence of initial fixation position on subsequent eye-movement behavior on an image. We presented 64 colored photographs to participants who started their scanpaths from one of two experimentally controlled positions in the right or left part of an image. Additionally, we computed the images' saliency maps and classified them as balanced images or images with high saliency values on either the left or right side of a picture. As a result of the starting point manipulation, we found long transients of mean fixation position and a tendency to overshoot to the image side opposite to the starting position. Possible mechanisms for the generation of this overshoot were investigated using numerical simulations of statistical and dynamical models. We conclude that inhibitory tagging is a viable mechanism for dynamical planning of scanpaths.
Due to visual acuity limitations of the retina, we need to move our eyes when exploring a visual scene. As a result, we usually observe clusters of fixations in some parts of the image. Both bottom-up (e.g., salience) and top-down factors (e.g., gist of a scene) have been put forward to explain the generation of fixation clusters. Here, we show that target selection is not only a consequence of image properties but also depends on the size of the attentional window – a stimulus-independent mechanism. We demonstrate that the inhomogeneous pair correlation function (PCF) can be used to investigate distributions of fixation locations during single trials, independent of the inhomogeneity generated by images. Our results show that fixations cluster at short length scales (<3°) during single trials. The effect cannot be explained by the overall inhomogeneity of fixations locations generated across subjects. Presenting the same image twice augmented the effect. The PCF can be interpreted as an indicator of the size of the attentional window that decreases during reinspection of images. In general, the limited attentional window reinforces inspection of close fixation locations. We conclude that the PCF is a promising tool to investigate dynamics of target selection during individual trials. Meeting abstract presented at VSS 2014