By Ramiro Gálvez, PhD student in computer science at the Computer Science Department, FCEyN, Universidad de Buenos Aires, Valeria Tiffenberg, Developer, and Edgar Altszyler, Postdoctoral fellow, Applied Artificial Intelligence Lab. Originally published at VoxEU
The belief that men possess greater cognitive abilities than women is a longstanding and well-documented stereotype, with studies showing that both boys and girls as young as six can view ‘brilliance’ as a predominantly male trait. This column explores the contribution of the film industry in the West to perpetuating this stereotype. An analysis of over 10,000 film transcripts reveals the persistent presence of the ‘brilliance = male’ stereotype over the past half a century, including in movies specifically aimed at children.
A particularly longstanding, prevalent and well-documented stereotype is the belief that men possess higher-level cognitive abilities than women (Broverman et al. 1970, Williams and Best 1982, Kirkcaldy et al. 2007, Upson and Friedman 2012).1This ‘brilliance = males’ stereotype has even been shown to be endorsed by both boys and girls as young as six (Cvencek 2011, Bian 2017) and is believed to be a factor driving the under-representation of women in science, particularly in the STEM fields (Nosek et al. 2009, Leslie 2015, Smyth and Nosek 2015, Storage et al. 2016, Reuben 2017). Even when a consensus exists on this stereotype having strong cultural roots, studies on its perpetuation usually centre on the analysis of cultural behaviours such as differential guidance provided by parents to their offspring during shared scientific thinking (Crowley 2001) or differential guidance given by science teachers to students according to their gender (Shumow and Schmidt 2013). Notably, there is a dearth of large-scale studies focusing on the presence of this stereotype in mainstream cultural products.
In a recent paper, we study the presence of the ‘brilliance = males’ stereotype in a collection of over 10,000 movie transcripts covering half a century of film history in the Western world (Gálvez et al. 2018). As stereotypes are, in part, a collection of associations that link a group to a set of descriptive characteristics (Gaertner and McLaughlin 1983), we use natural language processing techniques to quantify associations between gender-related words and high-level cognitive ability-related words in films. In doing so, a strong focus is placed on analysing the presence of these associations in films aimed at children.
Materials and Methods
We began data collection began by downloading from IMDb a series of lists containing the 1,000 top grossing titles in the US for every year from 1967 up to and including 2016. Then, for each title in these lists, metadata was downloaded. With this data in hand, we filtered out all titles which were not movies (such as TV series), which did not include English among the languages spoken in them, and in which the US, UK, Canada, or Australia were not involved in their production. Finally, for each movie in the resulting set, its most frequently downloaded English subtitle was obtained from OpenSubtitles.2This resulted in our final sample of 11,550 film subtitles spanning half a century.
Figure 1A details the number of films analysed for successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016), for the full-sample and for a sub-sample which contains only films belonging to the family and/or animation genres (family/animation sub-sample). Figure 1B shows the evolution of the ratio between the number of appearances of male pronouns (he, his, him, himself) relative to the number of appearances of female pronouns (she, hers, her, herself). In line with previous research on books (Twenge et al. 2012), in films this ratio has experienced a reduction since the mid-1960s – a phenomenon associated with an improvement in women’s status (Twenge et al. 2012) – but has been consistently less favourable towards female pronouns in the family/animation sub-sample when compared to the full sample.
Figure 1Film frequencies and gender pronouns ratios for successive ten-year periods
Notes: (A) Number of films analysed for successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016), for the full-sample and the family/animation sub-sample. (B) Evolution of the ratio between the number of appearances of male pronouns relative to the number of appearances of female pronouns, for the full-sample and the family/animation sub-sample. In both panels tendencies are estimated through LOESS regressions.
To estimate word associations between gender-related words and high-level cognitive ability-related words, we compute positive pointwise mutual information (PPMI) scores (Martin and Jurafsky 2009) between gender pronouns and high-level cognitive ability-related words (e.g. genius, intelligent, clever).3 PPMI is a metric designed to capture how much more often than chance two words co-occur (higher values meaning higher associations), and is commonly used for measuring associations between words and concepts. PPMI estimates rely on values contained in a co-occurrence matrix, which presents the number of times a word appears in the context of another one (each row representing a target word and each column representing a context word). Figure 2 contains a snippet illustrating how, given subtitle data, we built these matrices.4
Figure 2Co-occurrence matrix construction
Notes: Given a target word in a SubRip file (him in the illustration), all neighbouring/context frames are identified. Which frames constitute the neighbourhood depends on the size of a time window (Δt), which we set equal to 30 seconds. The text contained in all context frames is cleaned, tokenized and lemmatized. Then, the number of appearances of every context token is added to the relevant cell in the co-occurrence matrix under construction. The process is repeated for every word in every subtitle under analysis. A co-occurrence matrix presents the number of times a word appears in the context of another word (for example, and simply as an illustration, according to this figure smart appears eleven times in the context of him), and it serves as input for PPMI and statistical significance estimates.
Figure 3 quantifies associations between gender pronouns and words depicting high-level cognitive ability. Figure 3A presents estimates considering all movies from 2000 up to and including 2016, for the full sample and the family/animation sub-sample. Estimates indicate that associations of male pronouns with high-level cognitive ability-related words are higher than the associations female pronouns have with high-level cognitive ability-related words. This pattern is present in both the full sample and the family/animation sub-sample. Figure 3B explores the dynamics of these differences through time. Results from the full sample of movies reveal that differences in associations have been steady at least for half a century, with no evidence of convergence in the trends. Results from the family/animation sub-sample show that differences have also been prevalent in this set of films, although estimates are less stable (we attribute this to the fact that sample sizes for every ten-year period of the family/animation sub-sample are much smaller than their full sample counterparts, see Figure 1A).5Overall, our estimates suggest that, at an aggregate level, the ‘brilliance = males’ stereotype is effectively present in films and that movies specifically aimed at children contain this stereotypical association (which we believe contributes to its early adoption). Moreover, this pattern seems to have been quite persistent for the last 50 years.6
Figure 3 Word associations between gender pronouns and high-level cognitive ability-related words
Notes: (A) Estimated association between gender pronouns and high-level cognitive ability related words when films from 2000 up to and including 2016 are analysed, for the full-sample (n = 2,902) and the family/animation sub-sample (n = 242). Asterisks indicate the results of Fisher’s exact tests on the underlying contingency tables: *** significant at the 1% level. (B) Time evolution of the estimated associations taking as input sets of films belonging to successive ten-year periods (1967-1976, 1977-1986, …, 2007-2016). Tendencies are estimated through LOESS regressions. Grey areas indicate that, according to Fisher’s exact tests on the underlying contingency tables, differences are not significant at the 5% level.
The film industry in the Western world has been the subject of controversy in recent times regarding gender equality. Controversies range from the existence of a strong gender pay gap (actors being paid considerably more than actresses) to allegations of widespread prevalence of sexual assault and harassment. Our results suggest that gender inequality is also considerably strong in the contents of its films. Given that stereotypes regarding intelligence have been found to shape intellectual identity and academic performance (Steele 1997), the need to proactively address their presence in films is evident.
See original post for references