Ebola Comes to the United States
On Sept. 30, 2014 the Center for Disease Control (CDC) confirmed the first case of Ebola in the United States. Thomas Eric Duncan, of Dallas, Texas, was confirmed to have the Ebola virus. He had been exposed to the virus while in Liberia, after taking a sick neighbour to the hospital. Mr. Duncan died on Oct. 8. Since the initial news of his infection, interest in Ebola has exploded throughout the United States. From Google search trends to Twitter conversations, Americans have become eager for any information about the spread of the disease and what to do to protect themselves from this deadly virus.
The confirmed domestic Ebola case and subsequent spike in interest provides a great testing ground for studying how health information is spread in the modern information age. Ebola wasn’t new before Sept. 30 – it just wasn’t yet a major point of interest for Americans. While Ebola ran rampant in foreign countries there was a real – but minimal – interest. Once the virus reached home, interest exploded. By comparing the conversation before and after Sept. 30, and specifically sources of information about Ebola, an interesting story emerges about how Americans share health information.
The Role of the CDC
In the United States, the CDC is the primary source of information for infectious diseases like Ebola. Following the confirmed case of Ebola in Dallas on Sept. 30, not only should we be able to see a spike in Ebola mentions on Twitter, but also mentions of the CDC. To measure this effect, I collected 1,150,465 tweets from 7,525 individuals between Sept. 1 and Oct. 28, 2014. Figure 1 shows the percentage of mentions for Ebola and CDC around Sept. 30.
Figure 1: Percent of Twitter mentions for Ebola and CDC.
There is a sharp increase in mentions for both Ebola and CDC on Sept. 30 when the first U.S. case of Ebola was confirmed. There are, of course, many more mentions of Ebola than of the CDC, both before and after this date. The question is whether there are more or fewer relative mentions of CDC to Ebola after Sept. 30 than before. As more people begin to talk about Ebola, more people also mention the CDC, but are these newcomers to the conversation more or less likely to mention the CDC?
I measured this as the probability that a person who mentions Ebola also mentions the CDC: P(CDC | Ebola). I then extrapolated what this probability would look like from before Sept. 30, as if the U.S. Ebola case had never happened. I then compared the measured probability with the extrapolated probability. Figure 2 compares the actual and extrapolated probabilities.
Figure 2: Impact of Sept. 30 Ebola confirmation on probability that a person mentioning Ebola also mentions CDC P(CDC | Ebola). Original compares actual P(CDC | Ebola) (solid line) with the projected P(CDC | Ebola) without the outbreak (dashed line). Pointwise shows the difference between actual and projected.
From the analysis it is clear that the probability of someone mentioning Ebola will also mention the CDC drops after Sept. 30. This means that as more people on Twitter begin discussing Ebola, fewer and fewer of them are mentioning the CDC – what should be the primary source of information in this scenario!
So if Twitter users are not getting their information on Ebola from the CDC, then where are they getting it from? Looking once again at Ebola-related tweets posted after Sept. 30, I counted the most popular Twitter users who mentioned Ebola. For each mention I measured their percentage of tweet volume (the probability that an Ebola tweet also mentioned the Twitter account) and the Share of Voice (SOV – probability that a person mentioning Ebola also mentioned the Twitter account). Combining both of these measures gives an estimate for the top-cited sources of Ebola on Twitter (Figure 3).
Figure 3: Top 25 cited Twitter sources regarding Ebola.
I am relieved and reassured that the official CDC account (@cdcgov) is the top-cited source, followed by the Texas Health Resources’ official account. These two Twitter sources, accounting for 5% of Twitter citations of Ebola news, are primary sources for news on Ebola. There are also credible news outlets in the list: CNN, NBC News, ABC News, Associated Press, New York Times, Fox News, and CBS News. These are secondary sources that (most likely) link back to primary sources from the CDC and associated health agencies.
But what is @itsyavirusebola? This was a parody account (since removed by Twitter) set up solely to make light of Ebola news. It featured photo-shopped pictures with captions like “When you don’t feel like walking to the next village so you pretend to have Ebola.” Six of the top 25 cited Twitter sources for Ebola are parody or “humor” sites.
How prevalent are these humor sources in Twitter’s Ebola coverage? I tagged the top 100 cited sources based on the account content. Figure 4 summarizes the percentage of citations for each of these categories.
Figure 4: Overview of source type for Ebola citations on Twitter
“Humor” sites are the second most-cited source, accounting for 22% of total source citations, more common than primary sources (health organizations account for 13% of citations). In fact, primary sources (health organizations, government officials, and health officials) together account for 29% of the citations. Secondary sources (national news, local news, and journalists) are the most popular, accounting for 40% of citations. Tertiary sources (blogs, humor, public figures, and individuals) account for 31% of the citations. This finding is quite shocking. On Twitter, primary sources for the Ebola virus are the least shared of all!
No Laughing Matter
It is no surprise that news accounts are the most cited sources for Ebola. @cnnbrk, with 20.7 million followers, has close to 50 times the number of followers compared with @cdcgov (a mere 434,000 followers). It is also likely that the secondary source directly refers to the CDC. I suspect that even the tertiary sources refer to the CDC, as well. However, the probability that a person mentioning Ebola also mentions the CDC drops by about 20%. Notably, the humor sources account for about 20% of the citations. These percentages are eerily similar. It is very likely that this drop in concurrent mentions of Ebola and CDC is entirely due to the humor accounts entering the online conversation.
It is disconcerting that tertiary sources are so prevalent, and that humor sources are the second most popular source overall. These outlets are highly distributed, difficult to monitor, and increase the opportunity for rumours and mis-information. While I’m not arguing that the average Twitter user will mistake a tweet from @itsyavirusebola as fact, the sheer volume of this information type risks data overload, drowning out the credible sources that people need to hear. At the very least, the emergence and popularity of these sources further increases the complexity that health organizations face in keeping the American people safe and informed.
- This pattern of an explosion of interest only after a the virus has been confirmed within the Country is not unique to the United States. Spain and Chile also showed a similar pattern. See http://cyrusinnovation.github.io/ebola_search_analysis/.↵
- “CausalImpact 1.0.3, Brodersen et al., Annals of Applied Statistics (in press). http://google.github.io/CausalImpact/”↵
- This analysis requires a comparison time series. I take for this the probability that a person mentioning the CDC also mentions Ebola. There are two assumptions for this analysis. First, that the baseline time series is unaffected by the event compared to the time series under study. Figure 1 confirms that this is true. Second that the relationship between the two times series remains stable through the event. Since these two time series are related through Bayes Theorem, this too is true.↵