Senate Battle Royale
The US midterm elections on November 4, 2014 may see a change in Senate control from the Democrats to the Republicans. This potential change raises the stakes of this year’s senate race considerably. There are several “battleground” states — states where the election could swing either way. Some key ones are Arkansas, Colorado, Georgia, Iowa, Kansas, and North Carolina. For up-to-the-minute results, I use Twitter to forecast the current standings.
From each of these 6 states, I collected Tweets mentioning either the party (Republican or Democrat) or the Candidate from September 18, 2014 right up until the polls close on election day. I use aggregated poll data from RealClearPolitics.com to train a Vector Auto-Regression model with Exogenous Variables for the Twitter data. Figure 1 shows the current forecast for each of the 6 battleground states. Dotted lines are the aggregated polls and solid lines are the Twitter forecast. For the current forecast error bars are added to show the uncertainty in the measurement. In this model, uncertainty represents how well Twitter data matches the poll data — a large uncertainty means that there is disagreement between what the polls are saying and what Twitter is saying.
Figure 1: Twitter Forecast (solid line) and Aggregated Polls (dashed line) for upcoming Senate Race (All Polls Closed)
Who’s Going to Win?
For each race, I can estimate the probability that the Republicans will win in each state. This estimate is based solely on the quality of the agreement between Twitter and the polls. When Twitter and the polls confirm each other with a high degree of certainty, I will get a more certain probably estimate than might be predicted using the poll data alone. This is easy to understand as more data removes much of the uncertainty. Table 1 summarizes the current results.
|Republican||Democrat*||Prob(Rep > Dem)|
Table 1: Probability of a Republican winning in each state. *The opposition in Kansas is Independent (All Polls Closed)
Election day Tweets are beginning to arrive. Traditionally this day sees the most election traffic. It also can lead to some last minute surprises that the polls picked up. Colorado continues showing strong for the Republicans. Georgia’s Republican support appears to have shored up overnight but may change as people start tweeting from the polls. North Carolina, which was previously a toss-up, is solidifying around the Democrat candidate.
The most interesting states today are Arkansas, Iowa and Kansas. At the start of the day there was large dissonance in Arkansas between the polls and Twitter. In the past this pattern has indicated situations where the election results differ from the polls. As the day progresses, the dissonance is dying down and it is now looking likely that Arkansas will be Republican.
Kansas has been a tight race leading up to today. Taking the most recent Tweets into account, it looks like support for the Republican’s may be eroding in Kansas. If this trend continues through the day expect that the Independent candidate will win.
Iowa continues to be a toss up and where most election coverage will probably be focused. Twitter isn’t giving much indication over the polls yet. This election might come down to the wire when the polls close at 9PM CST.
A Runoff in Georgia?
The Georgia election has an interesting wrinkle. This year there is a 3rd party running for Senate. Under Georgia law, if neither candidate has 50% plus 1 votes there will be a runoff in January. The current Twitter forecast is showing a runoff as likely (Probability of 99.9%).
This article will be updated though election day to provide the most up-to-the-minute race forecasts.
Post Election Wrap-up
Qualitatively, the Twitter forecast did OK, calling 4 of the 6 elections accurately. It also detected the increase in republican support in Georgia and increase in Democratic support in Colorado. However it failed to capture the increase in Republican support in Kansas and North Carolina. Worse, the Twitter model suggested that the agreement between polls and Twitter conversation was strong enough to indicate a decision within those states.
Quantitatively the model underestimated the actual vote considerably. Table 2 compares the forecast numbers with the actual numbers. Overall the model was more accurate predicting the Democrat vote than the Republican. On average the Democrat vote forecast was off by 1.9 percentage points while the Republican vote forecast was off by 6.2 percentage points.
Table 2: Forecast compared with election results. *The opposition in Kansas is Independent.
How might the model be improved? As always there is the question of the poll numbers used in the model. If the poll numbers are systematically low due to undecided voters then it is unlikely that the quantitative values from the Twitter model will be accurate. But a systematic error in the poll data does not explain why the model is three times more accurate for Democrats then Republicans. Perhaps Democrats are more active on Twitter than Republicans, resulting in more data (and more accuracy) for the Democrats than the Republicans. If this hypothesis if true then the forecast error could be corrected by raking the data using a political affiliation model.
Look for further exploration of these questions in a future post!