| The Role of Attention in
Determining the Origin of the Scene-based Reference Frame Sean M.
Montgomery Abstract |
|
| It has been shown that visual perception may
utilize multiple representations of space simultaneously. It is unclear what role spatial
attention plays in establishing and selecting among different frames. However, it is clear
that spatial attention can be deployed in different reference frames. For example,
previous work using a letter reflection task has found an attention-related rightward
spatial bias that occurs within an intermediate-scale environmental reference frame that
can be dissociated from gravity-defined upright. I ran two experiments to investigate
whether the origin of such a scene-based reference frame can be determined by the locus of
spatial attention. I used a letter reflection task within a display which dissociated
locus of attention from retinotopic fixation, head/body midline, and the center of the
display. In the first experiment I found an attention-centered rightward bias that was
accompanied by a Simon effect. In a second experiment in which eye movements were
controlled for, I confirmed the presence of this rightward bias around the locus of
attention and also found a much weaker rightward bias that occurred around the fixation
point. These findings indicate that the origin of the scene-based reference frame can be
determined by the locus of spatial attention. The nature of the reference frame around
fixation is unclear, but the presence of the spatial bias around fixation in the second
experiment suggests that the monitoring of eye movements led participants to engage more
than one frame of reference. |
|
Introduction |
|
| The world that surrounds us is an incredibly complex and
detailed place filled with objects of all shapes, sizes, and colors. Any given object has
an array of particular features associated with it and these features may have particular
features of their own (e.g., a face has features of eyes which in turn have features of
shape, size, and color). In order to interact with the world successfully, we must
integrate our sensory input to organize the features within an object and the objects
within larger contexts to create some sort of mental representation of how things are
spatially related to us and to the other things in the environment. It has been proposed
that this organization may be accomplished by creating multiple spatial reference frames
which entail one or more axes having orientations, scales, directions, and an origin.
These reference frames are thought to be employed in a hierarchical fashion and to what
degree a given frame is used is probably determined by the task at hand. For the purposes
of this discussion, I am going to talk mostly about the ways in which visual input serves
to set up spatial representations, but let it be clear that proprioceptive, auditory,
vestibular, and tactile cues can also contribute to internal spatial representations of
the world. There are a number of different spatial reference frames one could use to represent spatial relationships in the world. Some reference frames have been termed viewer-centered or egocentric reference frames because the orientation, origin, and directions are defined with respect to the orientation, origin, and directions of the viewer. One viewer-centered reference frame whose importance has been acknowledged for some time is the retinotopic frame. This reference frame is automatically created by the pattern of light activating the photoreceptors in the retina. This incoming information can then be adjusted for eye, head, and body movements to create a stable representation with respect to the viewer's head, body, or appendages (Colby, Duhamel & Goldberg, 1995). Depending on the frame that is employed, the orientation of the axes could be determined by the orientation of the head, body, or appendages and the directionality by the viewer's intrinsic right/left, up/down, and front/back. The origin of viewer-centered reference frames could plausibly be determined by retinotopic fixation or by the location of the viewer's head, body, or appendages. These viewer-centered representations could be especially useful for performing many visually directed actions. Another way one could spatially organize the world would be to use an external, environment-based reference frame. One such reference frame could be established with gravity defining its upright axis. This sort of "whole-world" reference frame will often employ three perpendicular axes, but could also employ only two axes if one was looking at objects on a wall or computer screen. In the case of three axes, the axes perpendicular to the gravity defined up/down axis could be established in a number of ways, and would probably depend on the task being performed and the salient features in the environment. Some possibilities might be the viewer's front/back and right/left, or the alignment of the walls in a room, or even North/South and East/West if those directions were salient. The origin of a whole-world reference frame would likely be determined by the environmental locus of retinotopic fixation, a salient feature in the environment such as a particular object, or possibly by the locus of spatial attention. Calvanio, Petrone, and Levine (1987) provided evidence for the employment of both a viewer-centered reference frame and an environment-centered reference frame in the same task. They did this by using a spatial bias shown by patients with a neuropsychological syndrome known as unilateral neglect syndrome. This syndrome occurs following damage to the parietal cortex (especially in the right hemisphere) causing a perceptual deficit in half of the patient's visual world (most often the left side). However, as implied by the name of the syndrome, it is not that the visual information isn't being received, but rather that it is in some way ignored or neglected. Patients often don't report having seen anything on their neglected side (lack of explicit awareness), but their performance on various tasks is altered in a way that indicates that some information has been processed (presence of implicit processing). For a long time it was assumed that this bias against the left side was due to the fact that input coming into the retina from the left visual field transfers to the right hemisphere in the brain; thus the damage to the right hemisphere caused a left hemifield perceptual deficit. However, this assumption was confounded by the fact the patients were tested in conditions where the retinotopically defined upright was aligned with the environment-centered upright defined by gravity. In the Calvanio et al. study patients were placed on their sides so the viewer-centered upright was out of alignment 90 degrees clockwise or counterclockwise with respect to the gravity defined, environmental upright, and interestingly found the neglect to be related to both whole-world environmental coordinates and retinal coordinates. Another type of environment-based frame of reference which seems to be relevant to visual perception is an object-based reference frame. This reference frame is an environmental reference frame (because it is external to the viewer) that is established by an object and can be dissociated from the whole-world reference frame. For example, imagine a television. The up/down and left/right axes of a television are quite salient and may be maintained even if the television is rotated out of alignment with viewer-centered and the whole-world environment-centered reference frame. This idea of object-based processing has received much empirical support recently. Some powerful evidence comes from the neuropsychology literature on unilateral neglect patients. Tipper & Behrmann (1996) presented neglect patients with either a barbell stimulus or with two unconnected circles as seen in Figure 1. The circles or barbell then visibly rotated 180 degrees so that now the circle or side of the barbell that was previously on the neglected side of space is on the patient's good side, and that which was on the good side is now on the neglected side. Then on two thirds of the trials, a white spot appeared in one of the circles and the participants were told to respond when they saw the white spot. Remarkably, in the barbell condition these patients were much slower to respond to the side of the barbell that had initially been in the neglected hemifield, even though it was now in the good hemifield. Importantly, when the two disconnected circles were used this effect was not seen. This is presumably because, in the barbell display, a coherent object-based frame of reference is being established by the barbell and the patient's neglect is being distributed to the left side of this object-based reference frame (the left circle). When the barbell is rotated, the neglect rotates with the object frame, out of alignment with the viewer-centered and gravity-defined reference frames, to end up on the opposite side of the normally experienced neglect. This does not happen with the circles display because the circles dont form one coherent object and so when the circles are rotated around, the neglect doesnt stay with one circle. |
|
![]() |
|
| Reuter-Lorenz, Drain, and Hardy-Morais (1996) used a spatial bias in normal participants to support the role of object-based processing. The participants task was to detect the presence or absence of a gap in a box that was displayed in either the right or left visual field. The boxes were presented at eccentricities such that with a central fixation point it was possible to compare gap detections at the same location with respect the environmental and viewer-centered reference frames but on different sides with respect to the object-based reference frame (see Figure 2A). It was found that under these conditions, people showed higher gap detection accuracy on the left side of objects presented in the right visual field and on the right side of objects presented in the left visual field. The authors then went on to run the same experiment but with two disconnected vertical lines corresponding to the right and left sides of the square instead of the square (see Figure 2B). Under these conditions, participants did not show the large difference in gap detection accuracy that was found for the box experiment. Presumably the difference in performance between these two experiments is attributable to the fact that there is an object-based bias in the gap detection paradigm that is not present when a coherent object is not formed. | |
![]() |
|
| Evidence regarding the neurophysiological substrate of an
object-based reference frame comes from a single unit recording study in Rhesus monkeys
(Olson & Gettner, 1995). In this study, the authors found neurons in supplementary eye
field that fired selectively to eye movements to the left side of an object independent of
whether the object appeared in the right or left hemifield. This shows a neural correlate
for this idea of object-based processing. The combined evidence presented thus far strongly suggests that we have the ability to employ several different frames of reference. The examples, however, have been primarily limited to examples of the use of one reference frame at a time. Behrmann & Tipper (1999) explored whether multiple spatial reference frames can be employed simultaneously. In the first experiment, unilateral neglect patients were shown a barbell stimulus between two squares (Figure 3). On 2/3 of the trials a small detection target appeared. The target appeared in the center of the one of the two squares or in the center of one of the two circles that made up the ends of the barbell. There were two conditions: a static condition and a moving condition. In the static condition, all stimuli stayed in their original position and participants responded to target appearance. In a moving condition, the barbell rotated as in Behrmann & Tipper (1996), so that the right side of the barbell moved into the left visual field and the left side of the barbell moved into the right visual field. Again, participants responded to target appearance. The squares in this experiment established right and left locations of a stable "background" frame of reference, while the circles on the barbell established right and left sides of an object-based frame of reference which rotated out of alignment with the background frame on some trials and stayed in alignment on others. The results replicated the authors' earlier findings showing responses to be slower to targets appearing in the side of the barbell that was initially on the left of the display, regardless of which side it ended on. Additionally, the results also showed responses to targets appearing in the left square to be slower than responses to the right square, regardless of whether the barbell rotated or not. This gives strong support to the idea that participants were simultaneously forming two spatial codes; one with respect to the object-based coordinates of the barbell and one with respect to the background frame of reference. It isn't clear whether the background frame in this case is an egocentric frame or an environment-centered frame, but the data strongly support that two frames of reference are being employed simultaneously. The authors went on in another experiment to demonstrate that the two reference frames could be differentially weighted using a block of trials where the target was more likely to appear in one of the circles of the barbell and a block of trials where the target was more likely to appear in one of the squares. |
|
![]() |
|
| Most objects in the world are not nearly so simple as squares and barbells. As was briefly mentioned early in this discussion, most objects have intricate features (with those features having features as well). However, what defines an object and what defines its features may be determined by the task at hand. For example, if the task were to name the common living room appliance in Figure 4, the T.V. would be the object having features including: the antennae, the word on the screen, the on and off buttons, etc. Interestingly, if the task were to name the word on the T.V. screen, the object of focus would be the word on the screen with letters constituting its features. However, the more global context created by the T.V. is not irrelevant, and is in fact necessary to perform the task. Without using the reference frame established by the T.V., it isn't clear whether the word on the screen says "MOM" or "WOW". Within the global context of the T.V. screen, however, the word is probably automatically read by most readers as "WOW". | |
![]() ![]() |
|
| Some investigators have coined the term "scene" to
refer to a frame of reference that is established by a global object or context which
influences the perception of a more local task relevant object (Rhodes & Robertson
submitted; Robertson, 1995). Using this terminology, the T.V. in Figure 4 establishes a
scene which influences the processing of objects within namely the ambiguous word.
The scene, the object, and the features are defined operationally in a hierarchical
manner, depending on task relevance. In order to study scene-based reference frames, some researchers have used a letter reflection task (Rhodes & Robertson, submitted; Robertson, 1995). Letter reflection tasks have been used for a long time to study mental rotation, and in early studies by Cooper and others, it was found that the time it takes to identify the reflection of a letter is linearly related to the angle through which it has to be rotated to appear upright (Cooper & Shepard, 1973). For example, a letter was presented to participants either mirror-reflected or normally oriented around its vertical axis and at some angle off of the viewer and gravitational upright. In this task participants report rotating the letter from its presented orientation into an upright orientation to compare the unknown handedness of the letter to the known handedness of an environmental or viewer frame. This report is confirmed by the fact that reaction times vary linearly with the degree to which the letter is rotated away from upright. According to Hinton & Parsons (1981), this finding is due to the fact that our ordinary mental representation of a letter doesn't intrinsically include its handedness. It is not necessary for the ordinary recognition purposes required in reading to know which side of a letter faces which direction. It is enough to merely identify the organization of curves and lines and edges to recognize a particular letter. In order to identify a letter as being mirror-reflected or normally oriented, one must be able to define the handedness of the object-based frame of the letter by its relation to a frame of known handedness. This process can happen in one of two ways: the letter can be mentally rotated to upright and compared with the viewer or environment-based frame of known handedness, or a frame of reference of known handedness could be rotated away from upright to the orientation of the letter to establish a frame of known handedness around the letter. Robertson, Palmer, and Gomez (1987) did an experiment investigating which of these two possibilities occurs. In one experiment, a pattern of letters appeared in the center of a computer screen oriented at some angle (a) with respect to the viewer and gravity defined upright, and participants responded by indicating whether the letters were presented mirror-reflected or normal. Following this prime, a target pattern of letters was presented in the same manner at another angle (b) and participants again responded to indicate the reflection of the stimulus. Based on the above possibilities for how participants perform this rotated letter reflection task, there are two separate predictions for their performance on this task. If participants are mentally rotating the prime letter to viewer/gravity determined upright to make the discrimination, then the response time for the target discrimination should be linearly related to the second angle (b). However, if participants rotate a frame from upright to the orientation of the prime and maintain this frame, then the participants have a frame of known handedness at this angle a in addition to the upright frame. The handedness of the target could then be determined by mentally rotating the target to the orientation of the prime. By this hypothesis, if the angle of the prime (a) was 45 degrees, and the angle of the target (b) was 60 degrees, the response time should correspond to 15 degrees of mental rotation instead of 60 degrees. The results from this experiment show that at least for a portion of the trials, a reference frame was being rotated from upright to establish a scene around the prime. Another study run by Robertson (1995) supports the notion that the letter reflection task can induce scene-based processing. In this study a four letter configuration appeared on a computer screen either upright or rotated +90 degrees, or -90 degrees out of alignment with viewer/gravity determined upright (see Figure 5). The configuration appeared with the letters normally oriented or mirror reflected about their intrinsic vertical axis and participants responded by indicating their reflection. A letter then appeared in a mirror-reversed or normal orientation on either the left or the right side of the scene established by the prime and participants indicated its orientation. Note that in the 0 degrees condition, the scene-based reference frame is in alignment with viewer and whole-world environment-based frames. In the +90 degrees condition the left and right sides of the scene are up and down, respectively, in viewer-centered and whole-world frames, while in the -90 degrees condition left and right sides of the scene are down and up, respectively, in viewer-centered and whole-world frames. In this paradigm, Robertson found that participants were faster to respond to letters presented on the right side of the scene. This rightward bias rotated with the scene, indicating that the bias occurred in scene-based coordinates as opposed to viewer-centered or gravity defined coordinates. Although this rightward bias may be used to track scene-based processing, it is not necessary that a rightward bias appear whenever scene-based processing is occurring. In fact some recent experiments reported by Davis (1998) demonstrate that when target locations also vary on the up/down axis, one obtains a scene-based upward bias. |
|
![]() ![]() |
|
| Rhodes and Robertson (submitted) recently did an experiment to explore whether spatial attention can be deployed in a scene. To do this they ran an experiment using a slightly different method to establish a scene. Initially the letter "A" appeared at the center of the screen along with two vertical columns of "A"s and "V"s located to either side (Figure 6). Both the central "A" and the two laterally positioned columns then visibly rotated either 0 degrees, 90 degrees clockwise, or 90 degrees counterclockwise. After rotation an "arrowperson" (a stick-figure with an arrow for arms pointing to his left or right) appeared at the center of the screen and pointed with 75% accuracy to the location of the subsequent target. The arrowperson appeared in the same up/down orientation as the rotated scene and so pointed to the scene-based left or scene-based right. The authors also used two stimulus onset asynchronies (SOAs) between the appearance of the arrowperson and the appearance of the target letter. The results showed a significant side x validity interaction where responses on validly cued trials were faster to right side targets than left side targets and responses on invalidly cued trials were faster to left side targets. That the rightward bias rotated with the scene and appeared only on validly cued trials suggests that attention can indeed be deployed in scene-based coordinates and that the rightward bias may in some cases be created by an unequal distribution of attention on the right and left sides of the scene. | |
![]() |
|
| As previously discussed, a reference frame is defined by a set
of coordinate axes having orientations, scales, directions, and an origin. It would be
interesting to know how the origin of the scene-based reference frame is determined. Using
the rightward bias obtained in the letter reflection task as an index of scene-based
processing, this could be determined by defining with respect to what spatial location the
bias is rightward. Some likely possibilities are the locations of retinotopic fixation, of
head or body midline, of an object or feature in the display, and of the locus of
attention. The fact that the rightward bias has an attentional component suggests that the
origin may be defined by the locus of attention. This possibility is supported by other
studies. Some of this support has come from studying a woman with a peculiar visual localization deficit (McCloskey & Palmer, 1996; McCloskey & Rapp, submitted; McCloskey et al., 1995). The woman, A. H., shows normal visual object recognition, but is severely impaired in tasks which require localization of visual stimuli. In one experiment, A. H. closed her eyes and experimenters placed an object in front of her. When she opened her eyes, her task was to make a ballistic reach (i.e., a reach without changing direction mid-movement) toward the object that was placed in front of her. When the target object was directly in front of her, A. H. reached accurately. However, when the object was placed on her right or left side, she reached to the wrong side of the table on two-thirds of the trials. A. H.'s impairment also occurred on the up/down axis. The authors argued that the impairment involves some level of visual perception, because A. H. had no trouble pointing accurately to the source of a sound or to which side of her body had been touched. They also argued that the deficit is not a visual-motor disconnection, because A. H. was also wrong about one-third of the time when a "right" or "left" verbal report was used to indicate on which side of a computer screen an X was presented. Most importantly, the errors exhibited by A. H. were highly systematic, in that they were reflections across a central axis. For example, in the above reaching task, false reaches to stimuli 60 degrees to the right always involved reaching 60 degrees to the left. In thinking about an underlying system that could support this type of deficit, the authors posit a coordinate system with perpendicular axes. The location of a target is represented on each axis by a direction and distance (in arbitrary units) from the origin. For example, imagine a clock centered on an ordinary Cartesian coordinate system. The "5" would have a positive (rightward) direction and a distance of about 1 (actually cos(¹/3)) on the X axis and would have a negative (downward) direction and a distance of about 2 (actually sin(¹/3)) on the Y axis. A. H.'s errors could reasonably be due to a misrepresentation of the direction of the target object, while the distance from the origin is represented accurately. Thus the systematic nature of A. H.'s errors allow one to determine where the origin of her functional reference frame is assigned. McCloskey and Rapp (submitted) ran a series of experiments to decipher what exactly defines the origin of the reference frame within which A. H.'s errors are reflected. In the first experiment the authors dissociated body and head midline, center of the display screen, and retinotopic fixation from each other. A. H. performed a task whereby an X was presented on a computer screen at a left, center, or right target location (Figure 7A). After stimulus offset, she touched the screen where the X had appeared. In the fixate right condition, A. H. was seated so that her body and head midline were at a left intermediate (LI) position and her fixation point was at a right intermediate (RI) position. Eye movements were monitored to ensure that she didn't break fixation. Figure 7B shows the response pattern that would support retinotopic fixation as defining the origin around which A. H.'s reflection occurs. Figure 7C shows the response pattern that would be expected if body or head midline defined the origin. Figure 7D depicts the response pattern that would be expected if midline of the display screen defined the origin. Errors occurred in the pattern predicted by Figure 7B, but not those predicted by 7C or 7D. This indicated that neither body/head midline, nor center of the display served to establish the origin of A. H.'s reflection errors. Before concluding that it was retinotopic fixation that defined the origin of the reference frame(s), another possibility was considered; namely that A. H.'s locus of attention was serving to define the origin. It was likely that in this task, the locus of attention was directed to the point of retinotopic fixation. |
|
![]() |
|
| In order to disentangle these possibilities, McCloskey and Rapp ran another experiment where retinotopic fixation was held at the center of the display and locus of attention was maintained at an intermediate right or intermediate left location (Figure 8). An X then appeared at one of four locations: far left, near left, near right, or far right, and A. H. responded as before. Eye movements were monitored to ensure retinotopic fixation was held at the center location. During one block of trials A. H. directed spatial attention to the left intermediate location and on the other block she directed her attention to the right intermediate location. To ensure that attention was being focused at the proper location, 0, 1, 2, or 3 dots appeared sequentially inside the proper square before and during the target presentation. After the target disappeared, A. H. responded first by indicating the number of dots and then by touching the location where the target X appeared. Results from this second experiment showed both errors across the locus of attention and across fixation. This indicates A. H.'s errors are taking place in a spatial reference frame where attention can determine the origin. It isn't entirely clear whether retinotopic fixation is playing a role in determining the origin, because it is possible that on the trials where a fixation-related error was made, attention was focused at fixation. This is supported by the fact that all fixation-related errors were made on zero dot trials. | |
![]() |
|
| In another experiment the authors repeated the second
experiment using a vertically oriented rather than a horizontally oriented display and
found similar findings. In a fourth experiment the authors repeated the second experiment,
but had no boxes at the locations where attention was to be maintained. Again nearly all
errors occurred around the locus of attention, indicating that the locus of spatial
attention alone (with no object at the location) can serve to establish the origin of the
frame. Other support for the hypothesis that a frame of reference can be centered on the locus of spatial attention comes from work on the Simon effect (Simon, 1969). The Simon effect is a stimulusresponse (SR) compatibility effect that can occur even when the location of a stimulus is task irrelevant. Specifically, responses are faster if the spatial location of the task relevant stimulus and response side are compatible than if they are incompatible. If, for example, an X or an O is presented on the right or left side of a screen, and the participants' task is to respond to Xs with a left key press and Os with a right key press, participants' responses will be faster to Xs than Os presented on the left side and faster to Os than Xs presented on the right side. Hommel (1995) suggests that the effect is caused by an automatic transfer of the spatial location of the stimulus to a response selection phase, serving as an orienting response (for another account see Stoffer, 1994). A natural question to ask is with respect to what reference frame or frames is the stimulus location encoded and further, what serves to define the origin of the frame(s). Exploring this question, Hommel and Lippa (1995) did a study in which participants responded with different hands depending on the brightness of a circle that appeared on a screen. The circles appeared on a digital image of Marilyn Monroe's face, covering either the right or left eye. The face appeared on the screen at one of five orientations: -90 degrees, -45 degrees, 0, +45 degrees, and +90 degrees. A significant Simon effect occurred with respect to the object-based coordinates of the face. Interestingly, there was a numerical decrease in the compatibility effect as the orientation of the face differed more from upright, though this interaction was not reliable. This shows that the Simon effect can occur in a frame other than the viewer and whole-world frames if the competing frame is made salient. Also the effect may be reduced when frames are out of alignment. To ascertain how the origin of the frame in which the Simon effect occurs is determined, Nicoletti and Umilt (1989) ran a series of experiments in which they dissociated retinotopic fixation, body midline, and display organization from locus of attention. In the first experiment they replicated the Simon effect with targets appearing at six possible locations -- three to the right and three to the left of fixation. A target, either a square or a rectangle, appeared inside one of the vertical rectangles as seen in Figure 8, and participants responded with one hand if the target was a square and the other if the target was a rectangle. Consistent with the Simon effect, compatible RTs were faster than incompatible RTs. In a second experiment, participants performed the same task, but the fixation was held at one of two locations to the right or left of all the boxes, rather than in the middle (Figure 8B). The Simon effect was centered around the center of the display, indicating that it was not retinotopic fixation that was determining the origin. |
|
![]() ![]() |
|
| In the next three experiments, the location of spatial
attention was dissociated from fixation. The authors had participants fixate on the far
lateral crosses. On each trial, 500 ms before a target appeared, a small solid square
precue appeared in one of the spaces between two adjacent rectangles (Figure 8C). Subjects
were instructed that the target would appear in one of the two rectangles on either side
of the precue, and to focus their attention on this cue when it appeared. The authors made
sure participants were directing their attention to the cued locations by including no go
trials in which the square had a small strip missing down the center and participants were
not to respond. The Simon effect occurred with respect to the location of the precue, but
not with respect to either the center of the display or fixation, providing strong support
for an attentional hypothesis. Next, Nicoletti and Umilt strengthened their claim by
using a display that might lead participants to organize the rectangles into two groups on
each side of the center square. Specifically a large gap was inserted between the third
and fourth target location rectangles (Figure 8D). Despite this, participants showed a
Simon effect with respect to the location of the precue. In a final experiment, purely
endogenous cues indicated the location to which attention should be focused. The numbers
1-5 occupied the spaces between the rectangles, and the cue number appeared below the
fixation cross (Figure 8E). In addition, participants' eye movements were monitored in
this experiment to be sure there were no eye movements to the cued location. Again the
Simon effect occurred with respect to the cued location. The Nicoletti and Umilt study provides strong support that it is the locus of attention that is determining the origin of the reference frame that is being used in the task. Thus, it may be possible under some circumstances to use the Simon effect as an index of where spatial attention is being directed. Hommel (1993) argues that attention alone can't determine the origin, but that attention must be attached to something in the visual display. This argument cannot be refuted with Nicoletti and Umilt's data as attention was always directed to an object in a location. However, the McCloskey and Rapp data show that attention alone can serve to establish the origin of a spatial reference frame. Given these findings, I performed two experiments to test the hypothesis that the locus of attention serves to establish the origin of the scene-based reference frame. I dissociated locus of attention from retinotopic fixation, body/head midline, and the center of the display, and used the scene-based rightward bias to explore what determined the origin of the reference frame. Specifically, I used a display with a cross at the center and one cross to each side and four possible letter locations as seen in Figure 9. In each trial, a yellow square appeared on one of the three crosses quasi-randomly, which indicated that the following letter would appear in the letter location either immediately to the right or immediately to the left of the cross with the yellow square. The yellow square presumably drew attention exogenously by its appearance and endogenously by the fact that focusing on the cue should facilitate performance. |
|
![]() |
|
| To examine whether the scene-based rightward bias was
determined by retinotopic fixation or locus of attention, I excluded the far letter
locations. The reason for this exclusion was two-fold. The first reason is that these
locations were a greater distance from the fovea and thus lower visual acuity would
confound the results. Additionally, with the far locations included, there is an unequal
contribution of cue-based sides in the calculation of fixation-based effects.
Specifically, there are two cue-based left locations on the fixation-based left side, and
two cue-based right locations on the fixation-based right side. This unequal distribution
confounds the fixation-based effects with cue-based effects. Some specific predictions can be made if the origin of the scene-based reference frame is determined by the locus of attention. If the origin of the scene-based reference frame is determined by locus of attention, the scene-base rightward bias should occur with respect to the cued locations; namely, RTs to letters in the cue-based right locations should be faster than RTs to letters in the cue-based left locations. However, assuming attention is being successfully drawn to each cross by the yellow cue, there should not be a rightward-bias that occurs with respect to the locus of fixation. In this task subjects responded with their right finger to normally oriented letters and with their left finger to reflected letters. Thus letter reflection can be used to reveal the Simon effect. According to Nicoletti and Umilt's findings, if attention is being drawn to the cue successfully, there should be a Simon effect that occurs around the cue, and not a Simon effect that occurs around fixation. This will be manifested in faster responses in compatible conditions (when normal letters appear in cue-based right locations or when reflected letters appear in the cue-based left locations) than in incompatible conditions (when normal letters appear in cue-based left locations or when reflected letters appear in the cue-based right locations). |
|
Experiment One |
|
| Method: Participants: Thirty-one Reed College students were recruited to participate in this experiment in return for a psychology department lottery ticket. Participants were asked to fill out a questionnaire concerning handedness, color vision, previous head trauma, and medication, i.e., factors which might influence visual attention. Of these 31 participants, 9 were rejected based on the presence of too many total errors. The a priori rejection criteria were set to maintain a 90% overall accuracy and to maintain 24 trials (out of 30) in each cell. Of the remaining 22 participants, 6 were left handed and 16 were right handed. Only the right handed data will be reported here. Of these 16, 8 were female and 8 male, and the mean age was 21. Apparatus and Stimuli: Stimuli were generated on an Apple color high-resolution RGB monitor (MO 401) run by a Macintosh IIx operating on system 6.03. MacStimulus, a graphical program developed by a Reed alumnus, was the program used to present stimuli and to record response identity and RTs (to ms accuracy). Stimuli were presented on a dark-gray background of a 640 by 380 pixel screen. The stimuli consisted of: a white cross subtending 1 degrees horizontally and vertically centered on the screen, another white cross 4.8 degrees left of center, and another 4.8 degrees right of center, a small red square subtending 0.15 degrees which appeared at the center of the center cross to help participants maintain fixation, a slightly larger yellow square of 0.4 degrees which appeared at the center of one of the three crosses to draw attention, and the white letters E, B, or G, in Geneva bold size 36, which appeared normal or mirror reflected at -7.2 degrees, -2.4 degrees, 2.4 degrees, or 7.2 degrees. Procedure: Testing took place in an Industrial Acoustic Environments Co. Inc., testing chamber with the lights dimmed. Participants were positioned 60 cm away from the computer screen and were instructed to align their head, body, and seat level, so that their eyes were directly opposite the center of the monitor. Half the participants used the index finger, middle finger, and thumb of their right hand; the other half used the same fingers on their left hand. Each participant ran in one session which included a practice period and 5 blocks of 72 trials each, with 2 minute breaks in between. Participants were allowed to practice until they felt comfortable with the task, with an upper limit of 1 block of 72 trials. Each trial in a block began with the red fixation spot appearing at the center of the center cross. Participants were instructed to fixate on this spot when it appeared and maintain fixation until the end of the trial when it disappeared. Four hundred fifty ms after the appearance of the fixation spot, the yellow square appeared at the center of one of the crosses. Participants were told that the target letter would always appear on the right or left side of the cross with the yellow square. Participants responded to the yellow square (without moving their eyes) by hitting the space bar with their thumb. When participants responded to the yellow square the square dissappeared for 15 ms and then reappeared. Two hundred twenty five ms after their response to the space bar, one of the three letters appeared for 150 ms, either normal or reflected, in the space immediately to the left or immediately to the right of the cross with the yellow square. Participants were instructed to press the "k" key (right key) if the letter was normal and the "j" key (left key) if the letter was reflected. After their response the red fixation spot disappeared for an inter-trial interval of 1125 ms. No feedback regarding the response was given on-line. Each block of 72 trials consisted of 2 each of the 36 possible trial types quasi-randomly ordered: 3 possible cue locations (left, center, right) determined by the yellow square, 2 possible target sides (left, right) for the appearance of the letter with respect to the cue, 3 possible target letters (E, G, K), and 2 possible letter reflections. The trial order of each block was viewed prior to running any participants to examine whether the cue location or target letter location was the same on several consecutive trials. Any block that had several consecutive similar trails was replaced by a new block of quasi-randomly ordered trials. Each block of trials took approximately 5 min. Data Analysis: Trials in which yellow square detection was faster than 150 ms or slower than 1500 ms were thrown out (0.9% of the trials were thrown out in this manner). Trials with incorrect responses to the target letter reflection were also thrown out (4.2% of the trials were thrown out in this manner). For the letter reflection responses, a mean and standard deviation RT was calculated for each cell of each participant's data (n³24). Any RT outside three standard deviations of the corresponding cell mean was considered an outlier and thrown out (1.4% of the trials were thrown out in this manner). The combination of these rejection criteria eliminated 6.4% of the total number of trials. RTs to targets appearing in the far locations were excluded from data analysis (although they are shown in Figure 10). Mean RTs were analyzed in a repeated measures ANOVA with responding hand as the between participants variable and three within participants variables: fixation-based side (left, right), cue-based side (right, left), and reflection (normal, reflected). Results and Discussion: RTs organized by display location can be seen in Figure 10. There was a significant main effect of cue-based side, F(1,15)=9.56, p<.01, because RTs to targets that appeared to the right of the cue (526 ms) were faster than RTs to targets that appeared to the left of the cue (544 ms). Importantly, there was not a significant main effect of fixation-based side, F(1,15)=.181, p>.67. These findings suggest that the origin of the scene-based reference frame is determined by locus of spatial attention. As was predicted by the locus of attention account, the scene-based rightward bias was found with respect to the cue, but not with respect to fixation. |
|
![]() |
|
| Furthermore, there was a significant cue-based side x
reflection interaction, F(1,15)=14.5, p<.005. This effect was in accord with a Simon
effect occurring around the cue, with RTs in compatible conditions (normal letter
right side of the cue, reflected letter left side of the cue) faster (523 ms) than
in incompatible conditions (547 ms). There was not a significant fixation-based side x
reflection interaction, F(1,15)=.0004, p>.98, showing no Simon effect around fixation.
These findings confirm the probability that participants moved their attention to the cue
because Nicoletti & Umilt showed the Simon effect to occur around the locus of
spatial attention. There was also a barely significant cue-based side x reflection x hand interaction, F(1,15)=5.16, p<.05, wherein the Simon effect was smaller for the right than left hand, and the Simon effect was larger when the cue-based side and hand were incongruent than congruent. The smaller Simon effect for the right hand could be due to the fact that all the participants were right handed and thus may have been more adept at overcoming compatibility issues with their right hand. The fact that the Simon effect was larger when the cue-based side and hand are incongruent could have been caused by an added incompatibility in response selection. There was also an interaction of fixation-based side x cue-based side that was nearly significant, F(1,15)=4.02, p<.065, showing a greater difference between RTs for targets appearing on the left and right cue-based side to the left of fixation (31 ms) than to the right of fixation (5 ms). A paired t-test showed that on the fixation-based left side, RTs to targets appearing on the two cue-based sides were significantly different, t(15)=-3.9, p<.005, but on the fixation-based right side, RTs to targets appearing on the two cue-based sides were not significantly different, t(15)=-.572, p>.57. If one thinks about scene-based reference frames as a variant of object-based reference frames, this finding is similar to the object-based bias found by Reuter-Lorenz et al. (1996) superimposed on the attention-centered scene-based rightward bias. Reuter-Lorenz et al. found a barely significant facilitated detection on the right side than left side of objects to the left of fixation and on the left side than right side of objects to the right of fixation. A similar effect is shown here. The most striking is that RTs to targets on the right side of the left cue are faster than those to targets on the right side of the center cue. This is difficult to explain by either a fixation-based or cue-based rightward bias, but is comparable to the Reuter-Lorenz effect. These data may serve to highlight important ways in which the scene-based reference frame is similar to the object-based reference frame. Although these data support the hypothesis that scene-based processing occurs around the locus of attention, there is a possible confound if participants were overtly moving their eyes to the cue rather than staying fixated at the center. If subjects were moving their eyes to the cue, the rightward bias may have been centered around fixation rather than the location of attention-centering. Since there was no hint of a fixation-based rightward bias and there was a strong cue-based rightward bias, nearly all of the participants would have to have moved their eyes to the cue on nearly all of the trials. Although this seems unlikely, a second experiment was run and participants eye-movements were monitored. If participants weren't moving their eyes in the first experiment, the pattern of RTs in the second experiment should replicate those obtained in the first experiment. |
|
Experiment Two |
|
| Methods: Participants: Seventeen right handed Reed College students were run in the second experiment. One was rejected based on the presence of too many total errors. The a priori rejection criterion was set to 90% overall accuracy. Of the remaining 16 participants, 9 were female and 7 male, and the mean age was 20. Apparatus and Stimuli: An ISCAN RK-464 infrared eye-tracking system was used in conjunction with a PC computer running ISCAN software. In addition to the stimuli used previously, three white question marks in Geneva bold size 36 replaced the crosses at the end of each trial. Procedure: Participants sat with their head in a chin rest positioned 60 cm away from the computer screen so that their eyes were directly opposite the center of the monitor. The eye-tracker was positioned in front of the stimulus monitor below the line of sight so vision of the monitor was not obstructed. Each participant ran in one session which included one practice block and 5 test blocks of 72 trials with 2 minute breaks in between. At the end of the 5th block, a preliminary analysis of the data was run to determine if there were fewer than 24/30 trials in any cell, based on rejections for fixation breaks and incorrect responses. If there were fewer than 24 trials in any cell, up to two additional blocks were run in whole and half block segments to maintain 24 trials per cell. Each trial in a block began with the appearance of the three crosses and the red fixation spot at the center of the center cross. Participants were instructed to fixate on this spot when it appeared and maintain fixation until the end of the trial when the crosses were replaced by the question marks. Participants were told that if they broke fixation that they would hear a beep created by the experimenter. Six hundred to nine hundred ms (determined randomly) after the appearance of the fixation spot, the yellow square appeared at the center of one of the crosses. Participants were told that the subsequent letter would always appear on the left or right of the cross with the yellow square. Participants responded to the yellow square (without moving their eyes) by hitting the space bar with their thumb. When participants responded to the yellow square the square dissappeared for 15 ms and then reappeared. One hundred sixty five ms after their response to the space bar, one of the three letters flashed for 150 ms, either normal or reflected, in the space immediately to the left or immediately to the right of the cross with the yellow square. Participants were instructed to press the "k" key (right key) if the letter was normal and the "j" key (left key) if the letter was reflected. After their response the crosses were replaced by question marks for a 1500 ms period in which participants were instructed to press the space bar if they had heard a beep during the immediately preceding trial. Each block of trials lasted approximately 7 minutes. Eye-position was calculated by ISCAN software. The calculated eye-position was superimposed on an image of the stimulus computer's display which was reproduced on a 10 inch video monitor. Calibration was performed by taking an eye position measurement while participants fixated at each of five different locations on the stimulus monitor. Determining exact eye position was difficult. The position calculated by the ISCAN software seemed to be modulated to some degree for some participants by the size of the pupil. Most of this variability occurred on an axis 45 degrees clockwise of vertical. To accommodate this variability, the experimenter monitored the calculated eye position on-line and watched for both gross position offsets (>3.5 degrees), and large (>1.5 degrees) and quick relative eye position changes with emphasis on changes that occurred temporally locked to the appearance of the stimuli and changes toward the location of the cue. If a "break fixation" occurred, the experimenter generated a beep in the participants room, and the participant entered the break fixation by pressing the space bar when the question marks came on the screen. Data Analysis: Trials were eliminated from analysis for the following reasons: participants broke fixation (2.2% of the trials were thrown out in this manner), response to the yellow square was faster than 150 ms or slower than 1500 ms (0.6% of the trials were thrown out in this manner), response to the letter reflection was incorrect (3.4% of the trials were thrown out in this manner), RTs to the letter reflection task were greater than 3 standard deviations from the cell mean (1.6% of the trials was thrown out in this manner). Based on these criteria, 7.6% of total number of trials were rejected. Statistical analysis was identical to the first experiment. Results: A schematic diagram of the RTs can be seen in Figure 11. There was a highly significant main effect of cue-based side, F(1,15)=71.2, p<.001, in which RTs to letters on the right side of the cue (492 ms) were faster than those to letters in the left side (515 ms). In contrast to the first experiment, there was also a significant main effect of fixation-based side, F(1,15)=5.96, p<.05, with responses to letters on the right side of fixation (497 ms) faster than responses to letters on the left side of fixation (510 ms). Thus, there was a highly significant rightward bias with respect to the cue, with also a weaker rightward bias with respect to fixation. |
|
![]() |
|
| There was a significant cue-based side x reflection
interaction, F(1,15)=31.8, p<.001. This effect was in accord with a Simon effect
occurring about the cue, showing responses on compatible trials (normal letter
right side of cue, reflected letter left side of cue) to be faster (489 ms) than
those on incompatible trials (518 ms). There was not a significant fixation-based side x
reflection interaction, F(1,15)=.779, p>.39, showing no Simon effect around fixation.
The fact that the Simon effect occurred around the cue and not around fixation suggests
that attention was drawn to the cue. There was a significant main effect of reflection, F(1,15)=14, p<.005, with responses to normally oriented letters (488 ms) faster than responses to reflected letters (520 ms). This is not a surprising or uncommon effect (Robertson, 1995). Finally, there was a significant fixation-based side x cue-based side interaction, F(1,15)=7.17, p<0.05. The effect showed a greater difference between RTs for targets appearing on the left and right cue-based side to the left of fixation (28 ms) than to the right of fixation (18 ms). This is the same effect that was equated to the Reuter-Lorenz object-based effect in experiment one. Since in this second experiment there is such a large cue-based rightward bias, the difference between RTs to targets appearing on the right and left cue-based sides was significant on both sides of fixation (both ps<.001). |
|
General Discussion |
|
| Previous research has shown a rightward bias in the letter
reflection task that occurs in scene-based coordinates. The present experiments used a
letter reflection task and found a rightward bias that was oriented around a cue that was
dissociated from fixation. Nicoletti and Umilt (1989) convincingly demonstrated that
the Simon effect can be oriented around the locus of spatial attention. Confirming that
the locus of attention was directed to the cued location, both of the present experiments
showed a Simon effect oriented around the cue and not around fixation. Taken together
these provide evidence that the locus of spatial attention is at least one possible
determinant of the origin of the scene-based reference frame. The first experiment found a rightward bias centered only around the cue, and not around the fixation spot. In the second experiment there was a very strong rightward bias with respect to the cue, and also a weaker rightward bias with respect to fixation. The major difference between the two experiments was that in the second experiment feedback about eye movements was given and trials containing eye movements were eliminated. Since the experiments were procedurally similar in most other respects, it seems likely that the difference in results from the two experiments was due to some aspect of the eye monitoring. It is possible that there was a fixation-based rightward bias in the first experiment, but because this effect was weaker than the cue-based effect, it was not seen because sometimes fixation was directed at the cue. However, one would then expect that the cue-based rightward bias would be enhanced in the first experiment, because both spatial attention and retinotopic fixation were directed at the cue. In contrast, the cue-based rightward bias was considerably stronger in the second experiment. It is also possible that the on-line monitoring of eye movements led participants in the second experiment to devote more resources to the task than participants in the first experiment because they knew that they had to stay fixated and that their performance was being monitored to ensure that they did so. This account is supported by a number of measures. The mean RT for the second experiment (503 ms) was faster than the mean RT for the first experiment (535 ms). The error rate in the letter reflection task was also lower in the second experiment (3.4%) than in the first experiment (greater than 4.2%) and many more participants were eliminated from the first experiment (9 participants) on the basis of too many incorrect responses than from the second experiment (1 rejection). Additionally, the second experiment showed a slightly larger cue-based Simon effect, as well as a stronger Reuter-Lorenz object-based effect. There were also the significant main effects of fixation-based side and of letter reflection where none was seen in the first experiment. In a repeated measures ANOVA of the RTs from both experiments with experiment as a between subjects variable, there was no main effect of experiment, F(1,31)=1.38, p>.25, and no variables interacted with experiment (although the interaction of reflection x experiment approached significance, F(1,31)=3.56, p<.07). Participants may have used the additional resources to employ more than one reference frame at the same time: one around the fixation spot in order to stay fixated and another around the cue to perform the letter reflection task. Numerous researchers, including Behrmann and Tipper (1999), have shown that it is possible to employ more than one reference frame at a time. The presence of two rightward biases one around the fixation spot and one around the cue suggests that there indeed were two frames being employed. One might guess that there are two scene-based reference frames being employed since there are two rightward biases. If this is the case there are two possibilities: the origin of the scene-based reference frame can be determined by either retinotopic fixation or locus of attention, or attention can be divided between the fixation spot and the cue to establish a scene-based frame around each. The latter seems implausible because, although a rightward bias appears in both frames, a Simon effect does not. If both frames are scene-based and the locus of attention determines the origin of each, it isn't clear what differs between the two frames to lead to a Simon effect in one and not in the other. Thus, if both of the frames employed in the second experiment are scene-based, it's likely that one is determined by the locus of spatial attention and the other by fixation. Rhodes & Robertson (submitted) and some unpublished work by Davis (1999) show biases in scene-based coordinates which don't interact with cue validity and therefore seem not to be attention-related. Another possibility is that a scene-based frame of reference is established around the cue in performing the letter reflection task and another (possibly viewer-centered) reference frame is established around the fixation spot to help maintain fixation. This is certainly a viable possibility given that neglect has been shown to have rightward biases that occur in a number of different reference frames (Behrmann & Tipper, 1999; Behrmann & Tipper, 1999). The current study strongly indicated that the origin of the scene-based reference frame can be determined by the locus of spatial attention. This was shown by a rightward bias, which has previously been found to accompany scene-based processing (Robertson, 1995), that occurred around the locus of attention. It is not clear whether the rightward bias with respect to the fixation spot is a due to a scene-based reference frame centered around fixation or if it is due to a rightward bias occurring in another reference frame. This issue may be decided by experimental designs that rotate the display and use attention cueing. The results of this study are additionally interesting because of the persistent effect mimicking the Reuter-Lorenz object-based bias. This result may serve to highlight the ways in which the scene-based reference is like an object-based frame. It is possible that similar machinery is used to employ both in a hierarchical fashion. |
|
|
References |
|
|
|
![]() |
|