Acquiring Visual*-,Linguistic Associations and Reasoning with Them

1992 
Human beings combine information of several kinds from a number of modalities in the course of everyday activities. Visum and linguistic information are two major types of such information; these seem to be used almost effortlessly in the process of everyday activities which may involve seeing, talking, acting or a combination thereof. Visual *-* linguistic associations are central to the abilities of both children and adults to perform ordinary tasks and notso-ordinary tasks in the world. For example, when a person (A) tells another (B) "Turn on the light in the kitchen," is assumed that A and B are in agreement on what words such as "light," "kitchen" and "on" refer to. Further, B may have to find the light switch in the kitchen and turn it on. Sometimes B may respond with "The light is already on in the kitchen" or with "Which one of these switches is for the light?". Other examples may involve thought experiments and mental imagery. For instance, A may ask B "Is the living room in your new apartment bigger than this room?" and before responding, B may have to visualize his new living room in his mind’s eye and resort to some subtle reasoning (e.g. as to the sizes of tables and chairs and distances between them) to compare the size of the imagined room with the size of the perceived room. We are of the view that one way to look upon mental representations is to assume that visual and linguistic information may be intertwined in the same memory structure, or if they get processed into distinct memory structures there are rich finks between them that facilitate going from one representation to another, as necessary. We have developed a computer simulation of such a model where inputs from separate visual and linguistic modalities are processed and then combined. Hierarchical representations of visual inputs and linguistic inputs are built and linked together; this makes possible reasoning with both representations and also grounds each representation in the other. For example, once the words "horse" and "striped" have been described to a person using visual inputs (pictures), simply being told that a "zebra is a striped horse" is often adequate without actually being shown one (example due to Harnad, 1990). To teach "striped" in the first place, we might say something like "striped means with long narrow ribbons more or less parallel to one another," but think of how much simpler and more informative it is to show a few pictures of striped objects. Similarly, we are able to visualize a "sphinx" or a "unicorn" quite well from verbal descriptions. Whenever we describe objects or narrate events, we are making use of the fact that words or linguistic symbols are grounded in their visual counterparts. Central to all this is the issue of different kinds of representations and the efficiency associated with them. For our purposes, a representation can be thought of as a collection of percepts, compounds of percepts, and sets of mechanisms or operators that act on them to produce new percepts or symbols in a systematic fashion. For example, percepts such as "stem" and "round" sensed through the visual modality may be acted upon by an operator that looks for their relative positions to imply the percept that stands for the shape of an apple. Such operators may often be specific to the particular percepts that they act on. In many cases, pictorial percepts are far richer in meaning than verbal descriptions, and their use can greatly expedite recognition and learning.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []