The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception.
The illusion occurs when the auditory component of one sound is paired
with the visual component of another sound, leading to the perception of
a third sound. The visual information a person gets from seeing a person speak changes the way they hear the sound.
If a person is getting poor quality auditory information but good
quality visual information, they may be more likely to experience the
McGurk effect.
Integration abilities for audio and visual information may also
influence whether a person will experience the effect. People who are
better at sensory integration have been shown to be more susceptible to the effect. Many people are affected differently by the McGurk effect based on many factors, including brain damage and other disorders.
Background
It was first described in 1976 in a paper by Harry McGurk and John MacDonald, titled "Hearing Lips and Seeing Voices" in Nature (23 Dec 1976).
This effect was discovered by accident when McGurk and his research
assistant, MacDonald, asked a technician to dub a video with a different
phoneme
from the one spoken while conducting a study on how infants perceive
language at different developmental stages. When the video was played
back, both researchers heard a third phoneme rather than the one spoken
or mouthed in the video.
This effect may be experienced when a video of one phoneme's
production is dubbed with a sound-recording of a different phoneme
being spoken. Often, the perceived phoneme is a third, intermediate
phoneme. As an example, the syllables /ba-ba/ are spoken over the lip
movements of /ga-ga/, and the perception is of /da-da/. McGurk and
MacDonald originally believed that this resulted from the common phonetic and visual properties of /b/ and /g/.
Two types of illusion in response to incongruent audiovisual stimuli
have been observed: fusions ('ba' auditory and 'ga' visual produce 'da')
and combinations ('ga' auditory and 'ba' visual produce 'bga'). This is the brain's effort to provide the consciousness with its best guess about the incoming information.
The information coming from the eyes and ears is contradictory, and in
this instance, the eyes (visual information) have had a greater effect
on the brain and thus the fusion and combination responses have been
created.
Vision is the primary sense for humans, but speech perception is multimodal, which means that it involves information from more than one sensory modality, in particular, audition and vision.
The McGurk effect arises during phonetic processing because the
integration of audio and visual information happens early in speech
perception.
The McGurk effect is very robust; that is, knowledge about it seems to
have little effect on one's perception of it. This is different from
certain optical illusions,
which break down once one 'sees through' them. Some people, including
those that have been researching the phenomenon for more than twenty
years, experience the effect even when they are aware that it is taking
place. With the exception of people who can identify most of what is being said from speech-reading alone, most people are quite limited in their ability to identify speech from visual-only signals.
A more extensive phenomenon is the ability of visual speech to increase
the intelligibility of heard speech in a noisy environment.
Visible speech can also alter the perception of perfectly audible
speech sounds when the visual speech stimuli are mismatched with the
auditory speech. Normally, speech perception is thought to be an auditory process; however, our use of information is immediate, automatic, and, to a large degree, unconscious and therefore, despite what is widely accepted as true, speech is not only something we hear. Speech is perceived by all of the senses working together (seeing, touching, and listening to a face move). The brain is often unaware of the separate sensory contributions of what it perceives.
Therefore, when it comes to recognizing speech the brain cannot
differentiate whether it is seeing or hearing the incoming information.
It has also been examined in relation to witness testimony.
Wareham and Wright's 2005 study showed that inconsistent visual
information can change the perception of spoken utterances, suggesting
that the McGurk effect may have many influences in everyday perception.
Not limited to syllables, the effect can occur in whole words
and have an effect on daily interactions that people are unaware of.
Research into this area can provide information on not only theoretical
questions, but also it can provide therapeutic and diagnostic relevance
for those with disorders relating to audio and visual integration of
speech cues.
Internal factors
Damage
Both hemispheres of the brain make a contribution to the McGurk effect.
They work together to integrate speech information that is received
through the auditory and visual senses. A McGurk response is more likely
to occur in right-handed individuals for whom the face has privileged
access to the right hemisphere and words to the left hemisphere. In people that have had callosotomies done, the McGurk effect is still present but significantly slower. In people with lesions to the left hemisphere of the brain, visual features often play a critical role in speech and language therapy. People with lesions in the left hemisphere of the brain show a greater McGurk effect than normal controls. Visual information strongly influences speech perception in these people.
There is a lack of susceptibility to the McGurk illusion if left
hemisphere damage resulted in a deficit to visual segmental speech
perception.
In people with right hemisphere damage, impairment on both visual-only
and audio-visual integration tasks is exhibited, although they are still
able to integrate the information to produce a McGurk effect. Integration only appears if visual stimuli is used to improve performance when the auditory signal is impoverished but audible.
Therefore, there is a McGurk effect exhibited in people with damage to
the right hemisphere of the brain but the effect is not as strong as a
normal group.
Disorders
Dyslexia
Dyslexic individuals exhibit a smaller McGurk effect than normal readers of the same chronological age, but they showed the same effect as reading-level age-matched readers. Dyslexics particularly differed for combination responses, not fusion responses. The smaller McGurk effect may be due to the difficulties dyslexics have in perceiving and producing consonant clusters.
Specific language impairment
Children with specific language impairment show a significantly lower McGurk effect than the average child. They use less visual information in speech perception, or have a reduced attention to articulatory gestures, but have no trouble perceiving auditory-only cues.
Autism spectrum disorders
Children with autism spectrum disorders (ASD) showed a significantly reduced McGurk effect than children without.
However, if the stimulus was nonhuman (for example bouncing a tennis
ball to the sound of a bouncing beach ball) then they scored similarly
to children without ASD.
Younger children with ASD show a very reduced McGurk effect; however,
this diminishes with age. As the individuals grow up, the effect they
show becomes closer to those that did not have ASD.
It has been suggested that the weakened McGurk effect seen in people
with ASD is due to deficits in identifying both the auditory and visual
components of speech rather than in the integration of said components
(although distinguishing speech components as speech components may be
isomorphic to integrating them).
Language-learning disabilities
Adults with language-learning disabilities exhibit a much smaller McGurk effect than other adults. These people are not as influenced by visual input as most people.
Therefore, people with poor language skills will produce a smaller
McGurk effect. A reason for the smaller effect in this population is
that there may be uncoupled activity between anterior and posterior
regions of the brain, or left and right hemispheres. Cerebellar or basal ganglia etiology is also possible.
Alzheimer’s disease
In patients with Alzheimer's disease (AD), there is a smaller McGurk effect exhibited than in those without. Often a reduced size of the corpus callosum produces a hemisphere disconnection process. Less influence on visual stimulus is seen in patients with AD, which is a reason for the lowered McGurk effect.
Schizophrenia
The McGurk effect is not as pronounced in schizophrenic individuals as in non-schizophrenic individuals. However, it is not significantly different in adults.
Schizophrenia slows down the development of audiovisual integration and
does not allow it to reach its developmental peak. However, no
degradation is observed. Schizophrenics are more likely to rely on auditory cues than visual cues in speech perception.
Aphasia
People with aphasia
show impaired perception of speech in all conditions (visual-only,
auditory-only, and audio-visual), and therefore exhibited a small McGurk
effect.
The greatest difficulty for aphasics is in the visual-only condition
showing that they use more auditory stimuli in speech perception.
External factors
Cross-dubbing
Discrepancy in vowel category significantly reduced the magnitude of the McGurk effect for fusion responses. Auditory /a/ tokens dubbed onto visual /i/ articulations were more compatible than the reverse. This could be because /a/ has a wide range of articulatory configurations whereas /i/ is more limited, which makes it much easier for subjects to detect discrepancies in the stimuli. /i/ vowel contexts produce the strongest effect, while /a/ produces a moderate effect, and /u/ has almost no effect.
Mouth visibility
The McGurk effect is stronger when the right side of the speaker's mouth (on the viewer's left) is visible. People tend to get more visual information from the right side of a speaker's mouth than the left or even the whole mouth. This relates to the hemispheric attention factors discussed in the brain hemispheres section above.
Visual distractors
The McGurk effect is weaker when there is a visual distractor present that the listener is attending to. Visual attention modulates audiovisual speech perception.
Another form of distraction is movement of the speaker. A stronger
McGurk effect is elicited if the speaker's face/head is motionless,
rather than moving.
Syllable structure
A strong McGurk effect can be seen for click-vowel syllables compared to weak effects for isolated clicks. This shows that the McGurk effect can happen in a non-speech environment. Phonological significance is not a necessary condition for a McGurk effect to occur; however, it does increase the strength of the effect.
Gender
Females
show a stronger McGurk effect than males. Women show significantly
greater visual influence on auditory speech than men did for brief
visual stimuli, but no difference is apparent for full stimuli.
Another aspect regarding gender is the issue of male faces and voices
as stimuli in comparison to female faces and voices as stimuli.
Although, there is no difference in the strength of the McGurk effect
for either situation. If a male face is dubbed with a female voice, or vice versa, there is still no difference in strength of the McGurk effect.
Knowing that the voice you hear is different from the face you see –
even if different genders – doesn’t eliminate the McGurk effect.
Familiarity
Subjects
who are familiar with the faces of the speakers are less susceptible to
the McGurk effect than those who are unfamiliar with the faces of the
speakers. On the other hand, there was no difference regarding voice familiarity.
Expectation
Semantic congruency had a significant impact on the McGurk illusion.
The effect is experienced more often and rated as clearer in the
semantically congruent condition relative to the incongruent condition.
When a person was expecting a certain visual or auditory appearance
based on the semantic information leading up to it, the McGurk effect
was greatly increased.
Self influence
The McGurk effect can be observed when the listener is also the speaker or articulator.
While looking at oneself in the mirror and articulating visual stimuli
while listening to another auditory stimulus, a strong McGurk effect can
be observed.
In the other condition, where the listener speaks auditory stimuli
softly while watching another person articulate the conflicting visual
gestures, a McGurk effect can still be seen, although it is weaker.
Temporal synchrony
Temporal synchrony is not necessary for the McGurk effect to be present.
Subjects are still strongly influenced by auditory stimuli even when it
lagged the visual stimuli by 180 milliseconds (point at which McGurk
effect begins to weaken). There was less tolerance for the lack of synchrony if the auditory stimuli preceded the visual stimuli.
In order to produce a significant weakening of the McGurk effect, the
auditory stimuli had to precede the visual stimuli by 60 milliseconds,
or lag by 240 milliseconds.
Physical task diversion
The McGurk effect was greatly reduced when attention was diverted to a tactile task (touching something).
Touch is a sensory perception like vision and audition, therefore
increasing attention to touch decreases the attention to auditory and
visual senses.
Gaze
The eyes do not need to fixate in order to integrate audio and visual information in speech perception. There was no difference in the McGurk effect when the listener was focusing anywhere on the speaker's face. The effect does not appear if the listener focuses beyond the speaker's face.
In order for the McGurk effect to become insignificant, the listener's
gaze must deviate from the speaker's mouth by at least 60 degrees.
Other languages
People of all languages rely to some extent on visual information in speech perception, but the intensity of the McGurk effect can change between languages. Dutch, English, Spanish, German, Italian and Turkish language listeners experience a robust McGurk effect, while it is weaker for Japanese and Chinese listeners.
Most research on the McGurk effect between languages has been conducted
between English and Japanese. There is a smaller McGurk effect in
Japanese listeners than in English listeners. The cultural practice of face avoidance in Japanese people may have an effect on the McGurk effect, as well as tone and syllabic structures of the language.
This could also be why Chinese listeners are less susceptible to visual
cues, and similar to Japanese, produce a smaller effect than English
listeners.
Studies have also shown that Japanese listeners do not show a
developmental increase in visual influence after the age of six, as
English children do.
Japanese listeners are more able to identify an incompatibility between
the visual and auditory stimulus than English listeners are. This result could be in relation to the fact that in Japanese, consonant clusters do not exist.
In noisy environments where speech is unintelligible, however, people
of all languages resort to using visual stimuli and are then equally
subject to the McGurk effect. The McGurk effect works with speech perceivers of every language for which it has been tested.
Hearing impairment
Experiments have been conducted involving hard of hearing individuals as well as individuals that have had cochlear implants. These individuals tend to weigh visual information from speech more heavily than auditory information. In comparison to normal hearing individuals, this is not different unless there is more than one syllable, such as a word.
Regarding the McGurk experiment, responses from cochlear implanted
users produced the same responses as normal hearing individuals when an
auditory bilabial stimulus is dubbed onto a visual velar stimulus. However, when an auditory dental stimulus is dubbed onto a visual bilabial
stimulus, the responses are quite different. The McGurk effect is still
present in individuals with impaired hearing or using cochlear
implants, although it is quite different in some aspects.
Infants
By measuring an infant's attention to certain audiovisual stimuli, a response that is consistent with the McGurk effect can be recorded.
From just minutes to a couple of days old, infants can imitate adult
facial movements, and within weeks of birth, infants can recognize lip
movements and speech sounds. At this point, the integration of audio and visual information can happen, but not at a proficient level. The first evidence of the McGurk effect can be seen at four months of age; however, more evidence is found for 5-month-olds. Through the process of habituating
an infant to a certain stimulus and then changing the stimulus (or part
of it, such as ba-voiced/va-visual to da-voiced/va-visual), a response
that simulates the McGurk effect becomes apparent.
The strength of the McGurk effect displays a developmental pattern that
increases throughout childhood and extends into adulthood.