Going to the Movies? Expect to See White.
Film dialogue from 2,000 screenplays, broken down by race and age.
(NOTE: This is a replica of Pudding.cool's gender & film dialogue analysis, with a newly compiled dataset. Visualization and theme/style is unchanged from Matt Daniels' and Hanah Anderson's original piece, with the exception of a scatterplot of film earnings that I produced myself. The Pudding had no involvement in the development or publication of this specific project.)
*Domestic gross over $45M, inflation-adjusted. Using IMDB box office, 2,500 have hit this threshold.
All Genres
Action
Drama
Comedy
Sci-Fi
Search
Screenplay Dialogue, Broken-down by Race
2,000 Screenplays: Dialogue Broken-down by Race
Only High-Grossing Films: Ranked in the Top 2,500 by US Box Office*
Of 24 Disney animated features, only five films had scripts where nonwhite actors spoke more than 60% of the dialogue, and that’s likely an overestimate*. Movies like Toy Story and Monster’s Inc., by contrast, clocked in with nearly 95% of dialogue spoken by white actors, despite majority non-ethnic characters (i.e. toys and monsters). Generations of children grow up on such films, and rarely hear Indian, East Asian, Latino, or Middle Eastern accents as a result. By the time they reach adulthood, representation of such populations may depend on problematic comedic stereotypes like The Simpsons’ Apu.
Once these children graduate to live-action films, they receive no course-correction. Of the 2,000 screenplays in the database, roughly 50%* of characters with more than 100 words of dialogue were white (meaning, here, strictly caucasian. Fair skinned Asians or Latinos counted as nonwhite in the analysis). And yet despite this parity (an overrepresentation of nonwhite characters, proportional to the U.S. population), only 28% of films in the dataset had over 50% of dialogue spoken by nonwhite characters.
2,000 Screenplays: Dialogue Broken-down by Race
All Genres
Action
Drama
Comedy
Sci-Fi
Search
But I see errors in there. Like Frozen — those characters aren’t all nonwhite!
As Matt Daniels and Hannah Anderson mentioned in their gender-based film dialogue analysis for The Pudding, these findings aren’t definitive: a script can center around a character who doesn’t talk much (as in Mulan), scripts can change from words to screen, and in some cases I simply indexed the wrong actor (sorry, Elsa...more on that in a moment). But again, as Daniels and Anderson emphasized, the findings are directionally accurate — over so many screenplays, any trends that emerge would be robust to quiet leads or minor database errors.
But for your sanity, let’s take a second and talk about...
Methodology
Daniels and Anderson mapped characters with at least 100 words of dialogue in each of the 2,000 screenplays in their database to the respective actors’ IMDB pages. Any character who failed to breach this 100 word threshold was ignored, so some films with 100% white or nonwhite dialogue may in reality contain <100% one way or the other.
*Character race was quantified by running each actor’s IMDB photo through a facial recognition software. Scans returning caucasian with 85% confidence or greater were classified as white. All other characters were classified as nonwhite. Roughly 5,500 of 20,000+ actors could not be classified, and were given nonwhite values to avoid skewing the data in the direction of an effect. This strategy explains why “The Little Mermaid” gets a dialogue breakdown of over 60% nonwhite, or why roughly 50% of characters in the dataset are nonwhite. Reality is almost surely less equitable.
Each screenplay has over 90% of its dialogue categorized by race, and no subsets I could isolate apart from Tyler Perry films over-index nonwhite. In addition, many of the individual films that over-index nonwhite directly concern the nonwhite experience (e.g. Boyz N The Hood, Beasts of No Nation, The Color Purple, Fruitvale Station). This, as opposed to, say, a science fiction film like Blade Runner, which has no content-driven reason to over-index one direction or the other.
How many screenplays have nonwhite characters as leads?
One reason for this disparity is that only 32% — again, likely an overestimation (see Rocky V) — of films in the dataset had nonwhite leads (i.e. nonwhite characters with the most dialogue). The heroes of our stories — the secret agents, the detectives, the wizards — are overwhelmingly white. Relegating nonwhite characters to supporting roles like this restricts the imagination of the nonwhite audiences who watch them, emphasizing repeatedly to whites and nonwhites alike that the ideal they should aspire to is a man, and he’s white.
Ignoring, for a moment, the insidious ways in which economics and racism intersect, could this phenomenon be driven by the market, rather than racial biases? This was the most frequent objection levied this analysis when I began presenting it to friends and classmates. There’s just so many more whites in the country, they allege—it makes sense that studios appeal to a wider audience in the interest of profit.
But this reasoning doesn’t hold. The graph below, which shows percentage of dialogue spoken by white actors against US Gross for each film, suggests no correlation at all between whiteness and profit, hinting instead at one between equality and profit.
Domestic Gross by Percent of Dialogue Spoken by White Actors
Among 2,000 Screenplays, all genres, all years
The graph’s trendline emphasizes this non-relationship: as percentage of dialogue spoken by white actors increases, US Gross tends, on average, to remain steady. In other words, the two variables are unrelated. Movies with predominantly nonwhite-spoken dialogue perform just as poorly or successfully as movies with predominantly white-spoken dialogue.
The graph’s trendline emphasizes this non-relationship: as percentage of dialogue spoken by white actors increases, US Gross tends, on average, to remain steady. In other words, the two variables are unrelated. Movies with predominantly nonwhite-spoken dialogue perform just as poorly or successfully as movies with predominantly white-spoken dialogue.
After zooming in to more clearly examine the bulk of the films in the set, it appears US gross does increase, however minutely, with percentage of dialogue spoken by white actors, on average. But this result is likely driven by a high density of extremely-low-grossing films miscalculated as majority nonwhite in the analysis (remember, unrecognized faces were tagged nonwhite, and low-grossing films will inevitably skew towards casts of small-fry actors).
Age in Hollywood: White characters vs. Nonwhite characters
Daniels and Andersons also compiled and made public age data for each of the characters in the database (i.e., the age of the actor when they played the role).
Percent of Dialogue by Actors’ Age
Among 2,000 Screenplays, all genres, all years
Unlike their findings for gender, however, whites and nonwhites had nearly identical age distributions in the industry and a consistent gap between number of words spoken per age group. This 1) isolates gender as the culprit behind age-based differences in character dialogue, and 2) suggests no improvement over time for nonwhite actors hoping to outlive the bias against them. In other words veteran nonwhite actors are just as likely to be out-spoken by white actors at the end of their careers as they were at the beginning.
Here’s another look at the same data, but for every age:
Some last minute housekeeping...
Data from this analysis was compiled algorithmically, which is to say that the race of each actor was classified by a machine. Aside from the error-rates that accompany such processes (and can skew results), this means that the actors included in the analysis did not self-identify. For many, this likely represents an unconscionable theft of agency, and for those of you reading this far down who feel that way, I hear you.
But I would argue that in an analysis like this, it’s the perception of an actor or actress that matters. Rachel Dolezal can claim blackness all she wants; she does not and will not experience systemic racism the same way a black woman might. This plays out less offensively with actors/actresses like Alexis Bledel. Look her up: her father is Argentinian and she identifies as latina, but one would have understandable difficulty discerning that from her appearance. In a sense, the races assigned to actors in the analysis are not a stand-in for actual race, but for perceived race, a quantity a machine can much more safely identify.
Poke around below to explore the whole dataset of characters and films by film title:
All Films’ Dialogue, by Cast Member and race
All Genres
Action
Drama
Comedy
Sci-Fi
All Years
2010s
2000s
1990s
1980s
FILMS MATCHING YOUR CRITERIA