Predicting Personality Scores from Parliamentary Speeches

Series

Research Master Defense
Speakers

Paul Stroet , Paul Stroet
Field

Data Science and Econometrics

Location

Online
Date and time

August 30, 2021
09:00 - 10:00

This paper shows a novel method for predicting personality scores for political elites which circumvents survey-based measurements, but instead allows the use of text-based measurement of personality traits. The novelty lays in that it machine encodes features from texts by means of LDA, rather than relying on manual encoded features such as LIWC, MRC and prosodic features. The feature engineered variables extracted by LDA provide linguistic cues of personality traits and encapsulate the various political portfolios of the different MPs, thereby accounting for variation in text as exerted in different political domains. Next, the challenges of the current predictive modeling approaches are overcome by developing a predictive model which allows for the automatic detection of interaction effects, and the utilization of more flexibility to accommodate the complex structure that comes along with text data. The current most successful approach in predicting personality scores, support vector machines, serve as benchmark in assessing the performance of predictive models which are believed to have superior statistical properties. Indeed, random forests and neural networks can more accurately predict personality scores when fed with the machine encoded input data, than support vector machines fed with the LIWC features. The data is comprised of personality scores from Belgian MPs, and the text data is web-scraped from parliamentary speeches.

Keywords: Automated Content Analysis; Topic Modeling; Web-Scraping