VStheories&methods

Theory & Methods

THIS PAGE IS UNDER CONSTRUCTION - PLEASE CHECK BACK SOON

From this page you can access the more thorough theoretical methodological and theoretical explanations of each of the presented experiment, as well as useful links to other labs and descriptions of related topics.

McGurk Effect

Face Kinematics

Inverted Speech

Links

McGurk Effect

Theory

The McGurk effect (McGurk & McDonald, 1976) is derived by presenting incongruent audio-visual speech tokens. Though asked to determine what is heard observers will often report either the visual speech, or some emergent combination of the audio and visual segments. For example participants will tend to report hearing the syllable /va/ when presented with a visual /va/ and audio /ba/; participants will report hearing the syllable /la/ or /da/ when presented with a visual /da/ and audio /ba/. As you may be able to glean from the demonstration the strongest incogruency is between audio/ba/ - visual /va/ and audio /ba/ - visual /da/. There are a few theoretical explanations about why we perceive McGurk syllables in this way.

Modularity theory (Fodor, 1985; Liberman & Mattingly, 1985) describes speech and all speech-related activity as occcuring within an antomatomically and functionally distinct unit. The sole purpose of this unit to decipher and generate speech. This module has several characterstics it is cognitively impenetrable (i.e. any brain activity - thoughts, perceptions, sensations, unconscious activity - that is not related to speech will be unable to affect speech related activity). This cognitive impenetrability has the effect of causing one to hear the syllable /va/ despite the knowledge that the syllable /ba/ is being spoken. A module should be very ecologically relevant. In other words modular processes must be highly important to the survival of our species. Clearly the ability to use speech has been an important part of human survival. A module should be anatomically distinct. In fact there are areas of the brain that seem to be completely devoted to speech (e.g., Broca's area & Wernicke's area - see Friederici, 1993; Ross, 1984). Processes within the module should be automatic. Evidence for automaticity is evidenced by the McGurk effect from the singular percept that is experienced despite audio-visual discrepancies. While a perceiver could experience the audio and the visual separately, or be unsure what sound is being produced, or just ignore the divergent visual information to report what is being heard, none of these possibilities happen. Perhaps our speech module is automatically taking in everything that comes its way and spitting out the combined percept we experience. Finally modules will use all relevant information. The module can not ignore visual speech if it is present. The perceiver does not even have to know he or she is looking at speech (e.g., Rosenblum & Saldana, 1996)...if there is speech information present it will be integrated into the speech percept.

Motor Theory (Liberman & Mattingly, 1985) is subhumed within modularity theory. According to motor theory speech is disambiguated by comparing it to the production process. That is, upon sensing speech the perceiver determines how she might make the sound herself. Using her own vocal tract as a guide for production a perceiver can comprehend the incoming speech. This theory relates closely to the McGurk effect in which a perceiver has visual speech production information and auditory speech sounds. The perceiver determines how a given sound is produced and integrates that information with the way it appears to be vocalized. The final percept represents the sound one would hear if atempting to physically copy the vocalized sound. As this theory is an extension of modularity theory all the same axioms apply (i.e, cognitve impenetrability, anatomical distinction, ecological relevance, all relevant information, automaticity).

The Fuzzy Logical Model of Perception (FLMP) (Massaro, 1985; 1998) suggests that perceptual input is disambiguated in three steps. Initially the features are evaluated, then integrated together, and finally compared to representations. One way to approach this model is to consider each step as a question: Features (1) what are the parts?, Integration (2) what goes together?, Representations (3) what are these parts most similar to (in memory)? FLMP works by computing algorithms for all incoming input then calculating the a closest match within its stored representations. The McGurk syllables are combined in step 2, so they are able to be processed as a unit in step 3.

In the Direct-Realist Approach (Fowler, 1986; Fowler & Rosenblum, 1991) all the media affected by speech (e.g., light, sound) are structured in a way that is specific to the speech event. This specifity between the speech and the media allows a perceiver to detect an event without any additional processing - in contrast to motor theory or FLMP. This theory suggests that our perceptual system has evolved to detect these invariant relations between and event and the resultant media. As such a perceiver detects an invariant of a particular syllable directly, whatever is instantiated by the speaker. In the McGurk effect the invariant of a particular syllable is determined by the relationship of the audio and visual information, and how they jointly structure the media to produce an invariant.

Methods

These McGurk stimuli were made by video taping actors speaking a bevy of syllables (e.g., /ba/, /va/, /da/, /tha/, /ma/, /ka/). Simultaneous video and audio recordings were taken for each syllable. These recordings were sampled onto our computer via a VCR input and edited in Adobe Premier. In this program the syllables were mixed and matched into the congruent and incongruent tokens. The difficulty in this process is matching the onset and offset time between an actor's lips and a voice. Another concern is making the visual and audio tokens easily discernable. For example a speaker may have clear lip movments and a garbled voice. In such a case we may take the voice of one speaker and match it to the face of a different speaker. In effect this causes some of the congruent tokens (e.g., audio /ba/ - visual /ba/) to have incongruent actors. The differences between speakers, and the variation between speaking syllables by the same speaker must all be accounted for before presentation. The final presentation tape is a combination of the clearest speaking actors, with the clearest speaking voices, during their clearest performances. By doing this we maximize the McGurk effect across perceivers. However, the McGurk effect is strong enough to work even under poor conditions (we have effectively created a McGurk effect with live people, one standing in front of the other producing the appropriate movement or sound).

return to top of page

Face Kinematics

Theory

Under Construction

Methods

Under Construction

return to top of page

Inverted Speech

Theory

Under Construction

Methods

Under Construction

return to top of page

Links

Under Construction

return to top of page