Red
White
Blue
RUN
Debate.promises

Vector Graphics, Oil on Canvas

Contributors: VanderDrift, Shaull






The Presidential Debates Seen Through a Prism of Art and Natural Language Processing

In this work, computational linguistics is performed on transcripts of recent debates involving Hillary Clinton and Donald Trump.

Politicians have a tendency to repeat core concepts. I determined which phrases are used most frequently, focusing on three, five and seven word phrases and applying natural language processing algorithms.

I parsed over half a million words: 218,265 from Donald Trump and 308,319 from Hillary Clinton using an open source Python Library, Natural Language Toolkit (NLTK) (1).

I then take the results and infuse them onto vector graphics and animation tools to represent the data in an artistic form. Press “run” above to interact with the art.

If you're interested in the code and larger data sets, see below.

Ngrams Presidential Debates: 3, 5 and 7 word phrases.

Hillary Clinton

Phrase Number of Times Used
we have to 114
a lot of 101
i want to 82
to try to 73
we need to 66
i think that 51
and i think 47
i think its 46
were going to 45
we have a 41
we have a lot of 10
barriers that stand in the way 9
do everything I can to 9
that stand in the way 9
that stand in the way 9
at the end of the 8
stand in the way of 8
with a path to citizenship 7
immigration reform with a path 7
barriers that stand in the way of 8
comprehensive immigration reform with a path to 7
that stand in the way of people 5
and i will do everything i can 5
have a lot of work to do 4
no bank is too big to fail 4
to extend the social security trust fund 4
and thats what i will do as 4
we have a lot of work to 4
up to his or her godgiven potential 4
chance to live up to his or 4

Donald Trump

Phrase Number of Times Used
were going to 85
we have to 78
a lot of 65
by the way 78
going to be 48
let me just 41
you look at 39
first of all 39
you have to 36
going to have 33
let me just tell you 25
and i will tell you 9
if you look at the 9
that i can tell you 9
were going to have a 7
going to bring jobs back 7
going to be able to 7
i will tell you this 6
have to get rid of 6
tens of thousands of people 6
we have no idea who they are 4
he beats the rest of the field 4
i beat hillary clinton in many polls 3
see what happens at the end of 3
have a country or we dont have 3
ive hired tens of thousands of people 3
we should have gotten rid of the 3
lets see what happens at the end 3
youre going to destabilize the middle east 3
im going to bring jobs back from 3
If you would like the full data set please email us here: here.

The dataset

The University of California Santa Barbara provides the main primary debates. In order to get the corpus of each candidate we extract the text of each candidate.

Python has a great Natural Language Processor called Natural Language Toolkit (NLTK). It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. (1)

              import nltk
              # from nltk import word_tokenize
              from nltk.tokenize import TweetTokenizer
              from nltk.util import ngrams
              from collections import Counter
              import string

              #open and read file, make all text lowercase
              text = open("debates/hill/all.txt", "r").read().lower() 
              # remove all punctionation from text.
              text = "".join([ch for ch in text if ch not in string.punctuation])


              # that TweetTokenizer does not split the contraction into two parts: didn't, 'did', "n't" tockenize the text
              tknzr = TweetTokenizer()
              token = tknzr.tokenize(text)


              # counter
              trigrams = ngrams(token,3)
              fivegrams = ngrams(token,5)
              sevengrams = ngrams(token,7)

              #output three, five and seven word phrases
              print Counter(trigrams)
              print Counter(fivegrams)
              print Counter(sevengrams)
          
(1) Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.