User Tools

Site Tools


gibson:teaching:spring-2016:math445:lab8

Math 445 Lab 8: Predicting Presidential elections with Monte Carlo methods

Necessary Matlab concepts

  • rand random number generation
  • if-else statements
  • for loops
  • histogram plots

Background

Statistician Nate Silver made a name for himself in the 2008 Presidential election by correctly predicting the Presidential election outcomes for 49 of 50 states and all 35 Senate races. He used purely statistical methods applied to polling data. His statistical models had two parts. First, he developed models of the bias of individual pollsters based on past elections and used this to form composite, unbiased models of the aggregate polling data in each state. Second, he ran a large number of computer-simulated elections based on the unbiased composite state-by-state poll data and their margins of error. An estimate of the likelihood of either candidate winning the Presidential election was then given by the fraction of simulated elections that that candidate won.

In the 2012 elections Silver's projections received an enormous amount of attention, and quite a bit of criticism, too. Political pundits derided his work as meaningless number crunching and his 2008 results as lucky. But this time, Silver predicted the Presidential election correctly in all 50 states, and 31 of 33 Senate elections.

Further reading on Nate Silver, fivethirtyeight.com, and the mathematics of election prediction, and the Electoral College system:

This lab

In this lab we will simulate the 2012 Presidential elections based on last-minute polling data. Essentially, we're pretending that it's 2012 Election Day and we're trying to figure out the odds that Obama or Romney be declared the winner after the polls close. The first part of Silver's model (constructing unbiased composite polls) is too complicated for us to do here, so we'll just do the second part (running a bunch of simulated elections based on the unbiased poll data).

Problem 1: Predicting the 2012 New Hampshire Presidential election

Just prior to the 2012 Presidential election, polls showed that 51.5% of New Hampshire voters planned to vote for Obama, and 47.8% planned to vote for Romney. Thus it looked likely that New Hampshire's four Electoral College delegates would go to Obama.

But polling data is not entirely certain. The margin of error on the New Hampshire polls was estimated to be 3.4%. That is, the true intentions of New Hampshire voters could have ranged anywhere from 54.9% Obama and 44.4% Romney to 48.1% Obama and 51.2% Romney. Thus is was possible (if not likely) that Romney would win New Hampshire and take its four Electoral College votes.

Your job for problem 1 is determine how likely it is that Obama or Romney would win 2012 New Hampshire based on this polling data, using Monte Carlo methods. Run 1000 simulated elections. For each election, start by assigning 51.5% of the vote to Obama and 47.8% to Romney. But then choose a random percentage of voters between -3.4% and 3.4%, and add that percent to Obama while subtracting it from Romney. Compare the resulting percentages and declare the winner of the simulated election to whomever has the larger percentage.

Do this 1000 times, and count how many times Obama wins and how many times Romney wins. From your results, determine the likelihood that Obama or Romney would win the State of New Hampshire in the 2012 Presidential election.

Turn in your code for simulating the New Hampshire election (probably a Matlab script) and the percentage likelihoods that Obama or Romney would win New Hampshire.

Problem 2: Simulating the 2012 Electoral College votes of a few swing states

Problem 1 demonstrated that New Hampshire was a swing state in the 2012 election –it could have gone either way. Colorado was another swing state. Polls showed it at 50.9% Obama and 48.2% Romney with a margin of error of 3.0%.

New Hampshire had 4 Electoral votes and Colorado 9. All of a state's Electoral votes go to the candidate who gets the most votes.

Your job for problem to is to estimate the likelihoods of four possibile outcomes of the New Hampshire and Colorado elections.

  1. Romney wins both and gets 13 Electoral votes.
  2. Romney wins CO, loses NH, and gets 9 Electoral votes.
  3. Romney loses CO, wins NH, and gets 4 Electoral votes.
  4. Romney loses both and gets 0 Electoral votes.

Do this by simulating the New Hampshire and Colorado elections 1000 times. For each time, simulate a New Hampshire election and a Colorado election, as in problem 1. Award the Electoral votes of each state to the winner of that state. Record the Electoral results of each of the 1000 elections, and from this data determine the likelihood of each of the above four outcomes.

Turn in your code and the likelihood of each of these four outcome as percentages.

Problem 3: Simulating the elections in all fifty states and the likelihood that Obama or Romney wins the general election.

Now we're ready to tackle the original problem: a Monte Carlo simulation of the 2012 Presidential election. Specifically, given a list of states, the number of their electoral votes, the composite polling percentages for each candidate, and the margins of error those polling percentages, you are to run a large number of simulations of the election and determine the likelihood that either candidate will win based on the results of those simulations. For each state, start by assigning the specified composite polling percentages to the two candidates. Then transfer from one candidate to the other a different random number in the range between -margin and +margin. Compare the two percentages and award that state's electoral votes to the candidate with the larger percentage of votes. Do this for all fifty states (plus DC), add up all the electoral votes for each candidate, and award the nth election to the candidate with the majority of electoral votes.

Run a large number (say, 10000) of such simulated elections, keeping track of the number of electoral votes for each candidate in each election. Make a histogram that shows the statistical distribution of total electoral votes for one of the candidates, using bins of width 10 between 0 and 540 (0-9.99 for bin 1, 10-14.99 for bin 2, etc). If you can figure out how, color the bins corresponding to Romney wins red and the bins corresponding to Obama wins blue, or else just draw a vertical line at the magic number of 270 electoral votes needed to win the election outright.

Questions

Then answer the following questions (again, pretending that it's still Tuesday, November 6th 2012 and the real outcome is unknown).

  1. Who is most likely to win the presidential election?
  2. What is the probability that the most likely winner will actually win?
  3. What is the most likely range of electoral votes for the winner? (among the bins of width 10 specified above)
  4. What is the likelihood of a 269-269 electoral vote tie?

Turn in print-outs of your codes, your histogram, and your answers to the above questions.

Tips

  • Start with a small number of simulated elections (say 100) and then increase to a large number (say 10,000) when you're confident your code is working correctly.
  • Try to use as few for-loops as possible. If you are really on fire, you can do it with just one for-loop that loops over the number of trials.
  • Changing the colors of histogram bins in Matlab is not as easy as one might hope. You'll need to take data returned from the hist function and replot it with the bar command. See http://www.mathworks.com/matlabcentral/newsreader/view_thread/290534 for an example of how to do this.

Data

Here's the 2012 polling data, taken from http://fivethirtyeight.blogs.nytimes.com on 2012-11-01. You can load this into Matlab as a matrix P by cutting and pasting the data into a text file P.asc and running load P.asc within Matlab. If you don't believe this polling data, feel free to use something you trust more.

P.asc
% Composite 2012 Presidential election polling numbers
% from http://fivethirtyeight.blogs.nytimes.com
% 2012-11-01 1am
%
%  O == Obama percentage 
%  R == Romney percentage
%  M == margin of error
% EV == electoral votes
%
% O    R    M    EV      state
36.8  62.7  3.8   9   %  AL
38.8  59.7  6.0   3   %  AK
46.2  53.0  3.3  11   %  AZ
38.7  59.7  3.8   6   %  AR
58.2  40.5  2.9  55   %  CA
50.9  48.2  3.0   9   %  CO
56.7  42.4  3.3   7   %  CT
59.6  39.7  5.5   3   %  DE
93.1   6.3  3.2   3   %  DC  
49.9  49.7  2.7  29   %  FL
45.5  54.1  2.7  16   %  GA
66.5  32.6  3.9   4   %  HA
32.2  66.1  4.4   4   %  ID
59.9  39.5  3.0  20   %  IL
45.3  53.9  3.0  11   %  IN
51.2  47.8  3.2   6   %  IA
38.0  61.0  6.1   6   %  KA
40.4  58.7  4.5   8   %  KY
39.4  59.8  3.5   8   %  LA
56.1  42.7  3.7   4   %  ME
61.0  38.0  3.0  10   %  MD
59.1  39.8  3.7  11   %  MA
53.1  45.8  2.7  16   %  MI
53.8  45.0  2.9  10   %  MN
39.4  60.1  5.3   6   %  MS
45.6  53.6  2.8  10   %  MO
45.3  53.1  3.9   3   %  MT
40.5  58.8  3.3   5   %  NE
51.9  47.2  2.9   6   %  NV
51.5  47.8  3.4   4   %  NH
55.6  43.4  3.3  14   %  NJ
54.2  44.6  3.6   5   %  NM
62.5  36.9  2.8  29   %  NY
48.9  50.5  2.6  15   %  NC  
42.1  56.5  3.9   3   %  ND
51.4  47.6  2.7  18   %  OH
33.9  65.8  3.8   7   %  OK
53.7  44.0  3.6   7   %  OR
52.6  46.5  2.6  20   %  PA
61.9  36.3  4.3   4   %  RI
43.3  56.0  4.6   9   %  SC
42.6  56.1  4.2   3   %  SD
41.4  57.7  3.9  11   %  TN
41.3  58.1  3.1  38   %  TX
27.8  70.5  4.1   6   %  UT
50.8  48.6  2.5  13   %  VA
66.3  32.5  4.8   3   %  VT
56.2  42.5  3.5  12   %  WA
41.4  57.4  4.7   5   %  WV
52.5  46.8  2.9  10   %  WI
30.9  67.6  6.0   3   %  WY
gibson/teaching/spring-2016/math445/lab8.txt · Last modified: 2016/03/24 07:02 by gibson