Identifying Missing Component in the Bechdel Test Using Principal Component Analysis Method
A lot has been said and discussed regarding the rationale and significance of the Bechdel Score. It became a digital sensation in 2013, when Swedish cinemas began to showcase the Bechdel test score of a film alongside its rating. The test has drawn criticism from experts and the film fraternity regarding its use to rate the female presence in a movie. The pundits believe that the score is too simplified and the underlying criteria of a film to pass the test must include 1) at least two women, 2) who have at least one dialogue, 3) about something other than a man, is egregious. In this research, we have considered a few more parameters which highlight how we represent females in film, like the number of female dialogues in a movie, dialogue genre, and part of speech tags in the dialogue. The parameters were missing in the existing criteria to calculate the Bechdel score. The research aims to analyze 342 movies scripts to test a hypothesis if these extra parameters, above with the current Bechdel criteria, are significant in calculating the female representation score. The result of the Principal Component Analysis method concludes that the female dialogue content is a key component and should be considered while measuring the representation of women in a work of fiction.
 United Nations Statistics Division, Department of Economic and Social Affairs, 2019.
 World Development Indicators, The World Bank, 2018.
 Ortiz-Ospina E, Roser M – “Economic Inequality by Gender”, Our World Data, March 2018.
 Dr Smith, L.S, Choueiti, M, Dr Pieper, K. (2017). Inequality in 900 Popular Films: Examining Portrayals of Gender, Race/Ethnicity, LGBT, and Disability from 2007-2016.
 Bechdel Score and Test - www.bechdeltest.com. Accessed on February 23, 2019.
 Bleakley A - “Swedish cinemas take aim at gender bias with Bechdel test rating”, The Guradian, 2013.
 Dargis N – “Sundance Fights Tide with Films Like ‘The Birth of a Nation”, New York Times article, 2016.
 Hickey W, Koeze E, Dottle R, Wezerel G – “We pitted 50 movies against 12 new ways of measuring Hollywood’s gender imbalance”, 2017.
 Danescu-Niculescu-Mizil C, Cheng J, Kleinberg J and Lee L – Cornell Movie-Dialogue Corpus, Proceedings of ACL, 2012.
 The Movie Dataset, Kaggle – https://www.kaggle.com/rounakbanik/the-movies-dataset. Accessed on February 23, 2019.
 Social Security Administration United States, Popular Baby Names, 2017 https://www.ssa.gov/oact/babynames/. Accessed on February 23, 2019.
 Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
 Cavnar W, Trenkle J – “N-Gram Based Text Categorisation”, In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, pp 161-175.
 Kozloff, S. 2000. “Overhearing film dialogue”. Berkeley, CA: University of California Press.
 Tang, Duyu, et al. "Learning sentiment-specific word embedding for twitter sentiment classification." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.
 Goldberg, Yoav, and Omer Levy. "word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method." arXiv preprint arXiv:1402.3722 (2014).
 Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning ACM. (2008).
 Google Code Archive Project – Word2Vec”, 2013.
 Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
 Vader Lexicon Dataset, Kaggle - https://www.kaggle.com/nltkdata/ vader-lexicon#vader_lexicon.zip. Accessed on February 23, 2019.
 Jamilah, M.; Zakaria, A.; Md. Shakaff, A. Y.; Idayu, N.; Hamid, H.; Subari, N.; Mohamad, J. Principal Component Analysis—A Realization of Classification Success in Multi Sensor Data Fusion. In Principal Component Analysis—Engineering Applications; InTech: Philadelphia, PA, USA, 2012; pp. 1–25.
 Duda (R.), Hart (P.), Stork (D.), Pattern Classification. Second Edition, John Wiley & Sons, Inc., 2001.
 Haykin (S.), Neural Networks: A Comprehensive Foundation. Second Edition. Prentice Hall Inc.,1999.
 Kalantari, K. H. Processing and Analysis of Economic Data in Social Research. Publ. Consult. Eng. Landsc. Des. Tehran 2008, 3, 110–122.