Forget the Phone: Social Media is a More Accurate Way to Predict Vote Results

Forget the Phone: Social Media is a More Accurate Way to Predict Vote Results

Introduction

Previously I developed a model to predict who would get voted off American Idol based on what people were saying on Twitter[1].

To do this, I created a random “focus group” of American Idol fans who are active on Twitter. Each week I looked to see who this group was talking about. Since raw counts can sometime be misleading, I used a simple model for conversation as an exogenous variable for the autoregression. Essentially, it says that each week new people should be talking about successful contestants. So even if a contestant has a large number of mentions, if those mentions are coming from the same group of people, then the contestant is in danger of being voted off.

I ran this model against the historical data from season 12 of American Idol. The model was 80% accurate in predicting who would be voted off. But predictions against historical data can have some pitfalls. Since I already knew the results, I would naturally pick the model that gets the most accuracy. This can lead to overfitting: the model is very accurate on a specific data set, but loses accuracy against other sets of data.

The only way to truly judge the accuracy of a predictive model is to run it against the data in real time. How well does it perform when you don’t know ahead of time the result? Enter American Idol season 13!

American Idol Season 13

Season 13 of American Idol was broadcast from January 15, 2014 to May 21, 2014. The first few episodes covered auditions, semifinals and a wild-card round. Finals began February 26, 2014, featuring 13 finalists. Each week, during the finals, contestants performed on a Tuesday. After the performance, viewers would phone in their votes for their favourite contestants during a voting window. The contestant who was voted off would be revealed the following Wednesday.

While I would like to start predicting the losers as soon as the finalists are announced, it takes a few weeks to collect enough data on which to train a classifier to determine which contestant a person is talking about. For season 13, this training period took until March 20, 2014. The first week that the model could predict the loser was March 26, 2014.

Table 1 summarizes the results. Column 1 shows the date and time (all times in eastern timezone) that the loser was announced, column 2 shows who the actual loser was, column 3 shows my prediction, and column 4 shows the date and time I made the prediction.

Result DateActualPredictedPrediction Data
Mar 27, 2014 10 PMMajesty RoseMajesty RoseMar 27, 2014 10 AM
Apr 10, 2014 10 PMMalaya WatsonMalaya WatsonApr 10, 2014 10 AM
Apr 17, 2014 10 PMDexter RobertsDexter RobertsApr 17, 2014 9 AM
Apr 24, 2014 10 PMCJ HarrisJena IreneApr 24, 2014 11 AM
May 1, 2014 10 PMSam WoolfJena IreneMay 1, 2014 11 AM
May 8, 2014 10 PMJess MeuseJess MeuseMay 8, 2014 10 AM
May 15, 2014 10 PMAlex PrestonAlex PrestonMay 15, 2014 10 AM
May 21, 2014 10 PMJena IreneCaleb JohnsonMay 21, 2014 6 PM

Table 1: Actual and prediction results

The model correctly predicted five out of the top nine losers –an accuracy of 62.5%. The losers were predicted in real time before the results were announced. One exception was the final vote. In this case the results were very close and I continued collecting data throughout the day to see if that improved the accuracy of the prediction (which it didn’t).

So 62.5% is not bad: good enough to win some money on the betting markets, but quite a bit less than the 80% accuracy I had for season 12.

This begs the question: how good is 62.5% accuracy? To investigate, I compared my prediction accuracy to three other methods for predicting American Idol results: prediction markets, autodialers, and Vegas odds (Table 2).

ActualTwitterDialIdol.comVegas Odds
Majesty RoseMajesty RoseSam WoolfMajesty Rose
Malaya WatsonMalaya WatsonSam WoolfCJ Harris / Sam Woolf
Dexter RobertsDexter RobertsSam WoolfCJ Harris
CJ HarrisJena IreneSam WoolfCJ Harris
Sam WoolfJena IreneSam WoolfJess Meuse
Jess MeuseJess MeuseJess MeuseJess Meuse
Alex PrestonAlex PrestonCaleb JohnsonAlex Preston
Jena IreneCaleb JohnsonCaleb JohnsonCaleb Johnson

Tables 2: Comparison between predictions from Twitter, DialIdol.com and Vegas Odds

Prediction Markets

Prediction markets became popular around 2008. The idea was to harness the “wisdom of the crowds” to predict events. A market is created around events, such as which contestant will be voted off of American Idol. People can buy or sell shares in an event, using mostly face money (although some sites have experimented with real money). The idea is that the share price for an event will reflect the likelihood of it actually happening. An event that many people believe will happen will have a higher share price than one many feel is unlikely.

For some events, such as movie box office takes, prediction markets are extremely precise, with accuracies well above 90%. However, for events like who will be voted off American Idol, the accuracy seems quite low. I was unable to find a prediction market for season 13 of American Idol, though I did find one for season 9 on inkling.com. For that season, the accuracy was 45% in predicting who would get voted off.

Autodialers

Another method for predicting American Idol results is to try and directly measure which contestants people are voting for. This is done by providing autodialers: a service that will repeatedly dial the number for a contestant until the call gets through. One service, DialIdol.com, publishes predictions based on how many of their autodialer customers are voting for a contestant weighted by the percentage of busy signals (busy signals being an indirect measurement of how popular a contestant is).

For the same period as my predictions, DialIdol predicted the loser only 25% of the time. In DialIdol’s defence, for many weeks they reported that the contest was too close to call. It is interesting that this method’s the accuracy is the lowest, since this is the most direct method that I have found for measuring American Idol votes.

Vegas Odds: Follow The Money

Las Vegas accepts bets each week on American Idol and publishes odds each week for who will win. The odds are based on the how many people are betting for each candidate. Since real live people are putting cold hard cash on their selections, this should weed out a lot of the noise. Someone putting $1,000 down for a contestant to win has hopefully analyzed the situation in some depth.

The Vegas odds were accurate about 50%. That is, the contestant each week with the longest odds was voted off 50% of the time. That is pretty good, especially considering that Las Vegas closes betting before the Tuesday performance episode!

Conclusion

Comparing the accuracy of my Twitter model to other predictions leads me to conclude that for season 13 of American Idol, 62.5% is pretty good! Of all the methods it was the most accurate for this season.

Now that the season is over, I have the opportunity to explore in detail what went wrong for the weeks I predicted the wrong loser. While hindsight is 20/20, it is important not to overfit the model based on the season 13 data. Having a separate dataset for season 12 will help prevent this from happening. In an upcoming blog post, I will explore how to make the model more accurate, without overfitting based on a single dataset. I will also explore applying this model to other talent contests. Stay tuned!

Improved Prediction Model

After publishing this post, I reviewed my code to understand why the prediction for season 13 was worse than for season 12. In doing this, I found a small error in the code that reversed a few of the contestant names. This only affected a few of the outcomes. The revised predictions are in Table 3.

Result DateActualPredicted
Mar 27, 2014 10 PMMajesty RoseSam Woolf
Apr 10, 2014 10 PMMalaya WatsonSam Woolf
Apr 17, 2014 10 PMDexter RobertsDexter Roberts
Apr 24, 2014 10 PMCJ HarrisCJ Harris
May 1, 2014 10 PMSam WoolfSam Woolf
May 8, 2014 10 PMJess MeuseCaleb Johnson
May 15, 2014 10 PMAlex PrestonAlex Preston
May 21, 2014 10 PMJena IreneJena Irene

Table 3: Revised prediction results

With the error corrected the accuracy is still 62.5%, but the predictions are more accurate later in the contest and correctly predicts the overall winner. Small coding errors are part of the challenge when running live predictions, though I still consider the predictions in Table 1 the actual predictions for season 13.

[\update]

Featured Image courtesy of Wikimedia Commons, the free media repository

 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.