Using lingustic acoustic analysis to figure out what sarcasm really is and how humans are able to detect it.

When it comes to sarcasm or verbal irony, it can be difficult to distinguish the intended meaning. Especially with the rise of deadpan humor, the lines between jokes and simple utterances of truth, or legitimate commisive speech acts, are blurred. In application, understanding a systematic or rule-based phonetic guide to sarcasm would be helpful in not just discerning every-day semantics, but also in forensics. For example, how does a defense attorney argue that their client was “only kidding” in regards to a threat? Despite the FBI issuing statements saying that all hoax threats are felonies, in practice, some courts rule in favor of the defendant (Bowdich; Rushe). The supreme court has made precedent the importance of intent. It is understood that “vocal cues” or “ironic tones” can help listeners discern the speaker’s intended meanings, but without specific and commonly accepted phonetic rules, these vague understandings are rendered useless in practice and application.


As introduced in Patricia Rockwell’s study regarding the vocal cues of sarcasm, intentionally flouting Grice’s “quality maxim” requires the listeners to “be aware of more than mere literal content” (Rockwell). Making matters even more difficult, many times the irony is communicated best through nonverbal expressions. Some researchers have even suggested vocal sarcasm is only a result of the physiological characteristics of negative arousal, specifically the tightening of the jaw and a downward pull of the vocal muscles. The alternative is thus possible too — that the vocal aspects of sarcasm require and force the facial and jaw muscles to move accordingly.

Considering adults are more adept at recognizing vocal irony, imagine the following scenario. A child overhears an adult using sarcasm and saying “Oh my God, I’m gonna kill him!” The adult happens to be an employee of the child’s parent. The child then tells their parent who fires the employee for “threatening their life.” Luckily, the place of employment records the building 24 hours a day in case of emergencies. Although hypothetical, it is again clear that understanding the vocal cues behind sarcasm would be extremely helpful in arguing this case against a potentially wrongful termination.

Rockwell conducted a study in hopes to better detail these unwritten understandings of vocal irony, specifically in regards to the vocal or phonetic cues behind sarcasm. She hypothesized that listeners will be able to discriminate between “non-sarcasm and sarcasm using only vocal features,” including the use of slower tempos, greater intensity, lower pitch, less pitch variation, less resonance, and less precise articulation (Rockwell). Using twelve volunteer speakers ranging from actors to radio announcers, she professionally recorded three utterances eliciting an intended meaning. Two of these were sarcastic and one was not. The first two stimuli had contextual scenarios while the third stimulus only contained the phrase intended to be ironic, thus removing any element of context. Rockwell examined the tempo, intensity (amplitude), intensity variation (shimmer), pitch, pitch variation (jitter), resonance, and articulation. The vocal features were then coded by their reliability coefficients derived from the Spearman-Brown effective reliability formula. One hundred and twenty seven participants then listened to the recordings and rated on a Likert-type scale one to five, with five being the most sarcastic.

Her results indicated that acoustic analysis of sarcasm has a clear and predictable pattern. Sarcasm is displayed by a “slower tempo, lower pitch level, and greater intensity than non sarcasm” (Rockwell). Critical of these results is the fact sarcasm was able to be conveyed without context, as many previous studies have suggested. Opposite of her hypotheses, there were no systematic variation regarding annunciation or resonance. This could be, admittedly, because her speakers were all in professional speaking or performing fields. Simply lower, slower, and louder voice, as relative to what is perceived as normal, is able to effectively convey sarcasm.

This study effectively showed a predictable and systematic way of decoding sarcasm from acoustic phonetic analysis. However, it is important to note that these acoustic features can be manipulated extremely easily. Temporarily lowered pitch could have many reasons such as depression or even seasonal allergies. Slower speech could be unintentional or as a result of fatigue. Louder speech could result from excitement or even from the situational conditions such as trying to speak over surrounding noise. It is the combination of these three features that seem to successfully indicate sarcasm.

Because it is possible these acoustic features can be manipulated in circumstances outside of sarcasm, it is important that the application of this study is simply a base-line analysis on the speech. It should not be used as an exclusive analysis especially in the absence of context. Alongside other forensic analysis, these findings are helpful in considering the possibility of sarcasm as the presence of lowered pitch, louder volume, and slower speech could indicate some form of vocal irony. If the goal is to prove someone’s intentions, it could at least point someone in the right direction.

As an extension of verbal irony, some people exhibit fake smiles when lying or delivering sarcasm. According to Véronique Aubergé and Marie Cathiard’s findings regarding the ability to hear the prosody of a smile, it is impossible to distinctly localize the specific features of the intended manipulation — whether that is verbal irony or hearing amusement while speaking with a smile. Though Rockwell was able to find three features present in sarcasm and Aubergé and Cathiard noticed increased F2 frequencies and longer final syllable duration, these features are not strictly limited to verbal irony or amusement. In amusement, higher F2 frequencies, lower F3, and higher mean intensity were clear indicators that the speaker was smiling, but these results could be extremely culturally dependent (Aubergé & Cathiard). All of the subjects were French and the ability to express amusement in other languages and cultures could be drastically different.

In both studies, the findings were conclusive to a particularly limited degree as one cannot say for certain that given some present conditions or acoustic features, that the speaker was exhibiting or intending to elicit some manipulation, whether that is amusement or irony. Like many acoustic vocal studies such as spectrogram comparisons, the results should merely be used in addition to other evidential analysis. With the rise of social media and people streaming their every moment’s thoughts, litigation and criminal suits regarding threats made online and on video will likely increase thus requiring lawyers to prove intent.

