The Implicit Social Code of Speech Prosody | Part 1

Research credits: Emmanuel Ponsot, Juan Jose’ Burred, Pascal Belin, Jean-Julien Aucouturier | Duke University. Michel Belyk and Steve Brown | Department of Psychology, Neuroscience & Behavior, McMaster University, Hamilton, ON, Canada

The term “prosody” comes from the Greek word prosodia, meaning “sung to music.” Speech prosody, therefore, refers to the song-like modulations that accompany speech.There are two types of speech prosody: affective prosody, which refers to pitch modulations that reflect the speaker’s psychological / emotional state (Fairbanks and Pronovost 1938) and linguistic prosody, which activates the syntactic logic of the utterance. Both rely on a set of acoustic cues related to pitch, loudness, tempo, and voice quality.

Human listeners are very adept at forming high-level social representations about each other based on even the briefest utterances. Vocal pitch is widely recognized as the auditory dimension that conveys the most information about a speaker’s traits, emotional state, attitude, etc. While past research has primarily looked at the influence of mean pitch, very little has been explored about how finely tuned pitch trajectories around the mean influence human social judgment. This post will shed a little bit of light on this subject.

The researchers cited above developed a voice processing algorithm able to manipulate the temporal pitch dynamics of arbitrary recorded voices in a way that is both fully parametric and realistic. They used this algorithm to generate thousands of novel, natural-sounding variants of the same word utterance, each with a randomly manipulated speech contour.

For each of the manipulated stimuli (initially, the word “Hello”), human listeners were asked to evaluate the speaker’s social state. Then, using the psychological technique of reverse correlation, the researchers determined the mental representation of the speech prosody type that drives such judgements.

Results: Derivation of Dominance and Trustworthiness Prototypes

The researchers first analyzed how participants’ judgements varied with the mean pitch of the manipulated utterances. Perceived dominance was negatively related to mean pitch and trustworthiness was positively related to mean pitch (but to a lesser extent).

In addition to mean pitch, dynamic pitch contours were analyzed using reverse correlation. This analysis showed that dominance judgments were driven by pitch prototypes with a gradual pitch decrease on both syllables, while trustworthiness judgements correlated with an upward inflection of the pitch on the second (closing) syllable only.

Credit: Duke University, Durham, NC. Research conducted by Emmanuel Ponsot, Juan Jose’ Burred, Pascal Belin & Jean-Julien Aucouturier

The research found that male and female listeners judged dominance and trustworthiness in a similar fashion. There was a small difference between the prototypes obtained for the
male versus the female voice for dominance. Because this difference was only visible on a single time segment, it is likely explained by intra-syllabic loudness contour differences between the two voices.

A second experiment tested the generality of these prototypes across words and speakers by applying them, their opposite patterns, or their mean values to new recordings of “bonjour” as well as to a variety of other two-syllable words recorded by new speakers.

Two new groups of participants were presented with these new voices at random (in terms of content and speaker) and rated them for perceived dominance and trustworthiness. As predicted for dominance, applying the original prototype to novel utterances significantly increased their perceived dominance, whereas applying the opposite pattern significantly decreased it, both for “hello” and for novel words. Even though dominant prototypes flattened to their mean pitch value also led to a strong increase in perceived dominance, this increase was significantly smaller than for original prototypes, showing that the prototypes did not reduce to a simple mean pitch effect.

Finally, applying the trustworthiness prototype and its opposite pattern significantly degraded and improved perceived dominance, respectively. These effects were significantly smaller than those induced by the appropriate dominance prototypes, however, showing that the two prototypes did not simply oppose one another.


For trustworthiness, applying the original prototype or the opposite pattern also increased or decreased trustworthiness as predicted, but significantly only for the latter. These effects were observable on new recordings of the word “hello” but not on other two-syllable words. In every other tested condition (mean values and dominance filters), perceived trustworthiness decreased. Further analyses revealed that, contrary to the relationship between mean pitch and dominance, that between mean pitch and trustworthiness was nonlinear, with reduced rather than increased ratings for large mean pitch levels. Also, reverse-correlation analysis on data from the second experiment suggested that the shape of the trustworthiness prototypes differed between words.

Finally, neither experiments found any effects or interactions between listener and stimulus gender (all P > 0.05), confirming that male and female listeners use similar strategies to process social dominance and trustworthiness in both male and female voices.

This study demonstrates that social judgments of dominance and trustworthiness based on spoken utterances are driven by robust mental prototypes of pitch contours, using a code that is identical across sender and observer gender, It also shows that prosodic mental representations such as these can be uncovered with a technique combining pitch manipulations and psycho-physical reverse correlation.


The mental representations found for dominant prosody, which combine lower mean pitch with a decreasing dynamical pattern, are consistent with previous research showing that people’s judgments of dominance can be affected by average pitch and pitch variability.

Trustworthy prosodic prototypes, which combine a moderate increase of mean pitch with an upward dynamical pattern, are consistent with findings that high pitch, as well as slow speech rate and smiling voice, increase trusting behaviors toward the speaker (TBD: perhaps signaling the speaker is a nonthreatening entity).

Beyond mean pitch, the temporal dynamics of the patterns found here were also consistent with previous discoveries of associations between general pitch variations and personality or attitudinal impressions, e.g., falling pitch in assertive utterances or rising pitch in affiliatory infant-directed speech. Still, these results show that mental representations for a speaker’s dominance or trustworthiness should be described in much finer temporal terms than a general rising or falling pitch variation.

What’s next?

Pitch variations around the mean play a considerable role in how we perceive the people around us, but they are not the only element of expressive prosody. Other important elements include loudness, speech rate, and timbre changes (human “tone”), all of which can be inhibited, mitigated, or amplified through other social cues and signals surrounding the speaker. More about that in our next post, so stay tuned.


Credits & Acknowledgements

Credits: Emmanuel Ponsot, Juan Jose’ Burred, Pascal Belin, Jean-Julien Aucouturier | Duke University

Tech Entrepreneur, co-founder of SubStrata Technologies

Leave a reply:

Your email address will not be published.

Site Footer

Sliding Sidebar