Code Art Project

The Presidential Debates Seen Through a Prism of Art and Natural Language Processing

In this work, computational linguistics is performed on transcripts of recent debates involving Hillary Clinton and Donald Trump.

Politicians have a tendency to repeat core concepts. I determined which phrases are used most frequently, focusing on three, five and seven word phrases and applying natural language processing algorithms.

I parsed over half a million words: 218,265 from Donald Trump and 308,319 from Hillary Clinton using an open source Python Library, Natural Language Toolkit (NLTK) (1).

I then take the results and infuse them onto vector graphics and animation tools to represent the data in an artistic form. Press “run” above to interact with the art.

If you're interested in the code and larger data sets, see below.

Ngrams Presidential Debates: 3, 5 and 7 word phrases.

Hillary Clinton

Phrase	Number of Times Used
we have to	114
a lot of	101
i want to	82
to try to	73
we need to	66
i think that	51
and i think	47
i think its	46
were going to	45
we have a	41
we have a lot of	10
barriers that stand in the way	9
do everything I can to	9
that stand in the way	9
that stand in the way	9
at the end of the	8
stand in the way of	8
with a path to citizenship	7
immigration reform with a path	7
barriers that stand in the way of	8
comprehensive immigration reform with a path to	7
that stand in the way of people	5
and i will do everything i can	5
have a lot of work to do	4
no bank is too big to fail	4
to extend the social security trust fund	4
and thats what i will do as	4
we have a lot of work to	4
up to his or her godgiven potential	4
chance to live up to his or	4

Donald Trump

Phrase	Number of Times Used
were going to	85
we have to	78
a lot of	65
by the way	78
going to be	48
let me just	41
you look at	39
first of all	39
you have to	36
going to have	33
let me just tell you	25
and i will tell you	9
if you look at the	9
that i can tell you	9
were going to have a	7
going to bring jobs back	7
going to be able to	7
i will tell you this	6
have to get rid of	6
tens of thousands of people	6
we have no idea who they are	4
he beats the rest of the field	4
i beat hillary clinton in many polls	3
see what happens at the end of	3
have a country or we dont have	3
ive hired tens of thousands of people	3
we should have gotten rid of the	3
lets see what happens at the end	3
youre going to destabilize the middle east	3
im going to bring jobs back from	3

If you would like the full data set please email us here: here.

The dataset

The University of California Santa Barbara provides the main primary debates. In order to get the corpus of each candidate we extract the text of each candidate.

Python has a great Natural Language Processor called Natural Language Toolkit (NLTK). It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. (1)

              import nltk
              # from nltk import word_tokenize
              from nltk.tokenize import TweetTokenizer
              from nltk.util import ngrams
              from collections import Counter
              import string

              #open and read file, make all text lowercase
              text = open("debates/hill/all.txt", "r").read().lower() 
              # remove all punctionation from text.
              text = "".join([ch for ch in text if ch not in string.punctuation])


              # that TweetTokenizer does not split the contraction into two parts: didn't, 'did', "n't" tockenize the text
              tknzr = TweetTokenizer()
              token = tknzr.tokenize(text)


              # counter
              trigrams = ngrams(token,3)
              fivegrams = ngrams(token,5)
              sevengrams = ngrams(token,7)

              #output three, five and seven word phrases
              print Counter(trigrams)
              print Counter(fivegrams)
              print Counter(sevengrams)

(1) Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.

Debate.promises

The Presidential Debates Seen Through a Prism of Art and Natural Language Processing

Ngrams Presidential Debates: 3, 5 and 7 word phrases.

Hillary Clinton

Donald Trump

The dataset