Before I dive into the details, here is some information about the dataset that we used in our study: From Twitter's Spritzer feed, we collected a random sample of public tweets from January 18, 2010 to March 8, 2010, yielding about 74 million tweets. That is, we collected about 1.5 million tweets per day, representing approximately 2-3% of 50 million tweets appearing on Twitter daily.
For each of these 74 million tweets, we scanned for a variety of retweet markers such as "RT @", "RT:@", "retweeting @", "retweet @", "via @", "thx @", "HT @", and "r @" [2]. We found that there are about 8.24 million retweets, accounting for 11.1% of all the tweets. Next, we searched for those tweets and retweets that contain at least one hashtag. We found that 10.1% of tweets and 20.8% of retweets include hashtags, suggesting that a tweet with hashtags is more likely to get retweeted.
We further investigated whether the retweetability of a tweet has anything to do with the type of hashtag it contains. Analyzing the 74 million tweets, we identified the 20 most popular hashtags used in our tweets and the number of tweets containing each hashtag:
Rank | Hashtag | Number of Tweets |
1 | #nowplaying | 355,147 |
2 | #ff | 224,760 |
3 | #jobs | 124,728 |
4 | #fb | 87,959 |
5 | #tinychat | 67,225 |
6 | #vouconfessarque | 51,578 |
7 | #fail | 49,248 |
8 | #tcot | 47,394 |
9 | #1 | 47,373 |
10 | #followfriday | 39,986 |
11 | #news | 38,573 |
12 | #shoutout | 30,633 |
13 | #tweetmyjobs | 30,594 |
14 | #bbb | 28,590 |
15 | #haiti | 28,563 |
16 | #letsbehonest | 27,926 |
17 | #iranelection | 27,611 |
18 | #quote | 27,541 |
19 | #followmejp | 25,940 |
20 | #follow | 24,166 |
On the other hand, the following table shows the 20 most popular hashtags used in our 8.24 million retweets and the number of retweets containing each hashtag:
Rank | Hashtag | Number of Retweets |
1 | #ff | 62,331 |
2 | #vouconfessarque | 43,628 |
3 | #nowplaying | 29,846 |
4 | #tcot | 18,527 |
5 | #idothat2 | 16,583 |
6 | #ohjustlikeme | 16,531 |
7 | #jafizisso | 15,564 |
8 | #haiti | 13,829 |
9 | #retweetthisif | 12,602 |
10 | #iranelection | 12,334 |
11 | #quote | 11,475 |
12 | #followfriday | 11,170 |
13 | #fb | 10,994 |
14 | #ihatequotes | 9,982 |
15 | #fail | 9,759 |
16 | #omgthatssotrue | 9,286 |
17 | #1 | 9,124 |
18 | #terremotochile | 8,892 |
19 | #p2 | 8,719 |
20 | #follow | 8,084 |
As can be seen, these two lists of hashtags do not match each other exactly. For example, #jobs appears only in the first list, while #idothat2 appears only in the second list. That is, the fact that a hashtag is frequently used in the tweets does not guarantee that it is also frequently used in the reweets, and vice versa.
For each hashtag, we computed a retweet rate by dividing the number of retweets containing the hashtag by the number of tweets containing the hashtag. We then normalized the rate so that a value of 1.0 represents the average retweet rate of 11.1%. For example, for #nowplaying, the retweet rate of 0.75 was calculated as (29,846/355,147)*(74/8.24). A hashtag with a retweet rate higher than 1.0 indicates that, compared to the average case, the tweets containing this hashtag have a higher chance of getting retweeted. The following table shows the retweet rates for the 10 most popular hashtags used in our tweets:
Rank | Hashtag | Retweet Rate |
1 | #nowplaying | 0.75 |
2 | #ff | 2.49 |
3 | #jobs | 0.16 |
4 | #fb | 1.12 |
5 | #tinychat | 0.04 |
6 | #vouconfessarque | 7.59 |
7 | #fail | 1.78 |
8 | #tcot | 3.51 |
9 | #1 | 1.73 |
10 | #followfriday | 2.51 |
In the following plot, each point represents an individual hashtag. The X-axis is the popularity rank of hashtags based on how many tweets contain each hashtag. The Y-axis represents the retweet rates of hashtags as computed above.
From the figure, we see that the retweet rates vary greatly. Not all popular hashtags in tweets are popular in retweets. The type of hashtag does matter.
References
[1] Suh, B., Hong, L., Pirolli, P., and Chi, E. H. Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network. To appear in SocialCom'10.
[2] boyd, d., Golder, S., and Lotan, G. Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter. Proc. HICSS'10, 1-10.
3 comments:
Nice stuff. We're treading on difficult correlation vs. causation ground here, though, aren't we? Even the title of your blog post suggests that if I add a hashtag, my post is more likely to get retweeted -- but what we're really seeing here is more a correlation.
The nice thing about retweets is that they give us a nice proxy for how much someone values a tweet. I wonder how much overlap there is with a feeling of "I'm glad I saw this!", and what it might not capture? But, obviously, you get the benefit of a huge dataset by doing it this way.
Michael, you are absolutely right that there is a big distinction between correlation and causation (http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation). What we have observed in this study is simply a correlation.
We view retweets as a social voting proxy. This is mostly due to the difficulties of capturing the feeling of "I'm glad I saw this!" Our ultimate objective is to understand why some tweets got retweeted, while others didn't.
Since users must see the tweet in order to retweet it, we know that they read the hashtag before they retweeted.
So the question is whether they retweet due to the inclusion of hashtags, or whether it is something more indirect. For example, maybe users who understand hashtags (more experienced) are more likely to retweet?
Post a Comment