Posted on Dec. 15, 2013
Today we'd like to share with you some fun charts that have come out of our internal linguistics research efforts. Specifically, studying weather events by analyzing social media traffic from Twitter.
We do not specialize in social media and most of our data analytics work focuses on the internal operations of leading organizations. Why then would we bother playing around with Twitter data? In short, because it's good practice. Twitter data mimics a lot of the challenges we face when analyzing the free text streams generated by complex processes. Specifically:
In the exercise below, tweets from Twitter's streaming API JSON stream were scanned in near real-time for their ability to 1) be pinpointed to a specific location and 2) provide potential details on local weather conditions. The vast majority of tweets passing through our code failed to meet both of these conditions. The tweets that remained were analyzed to determine the type of precipitation being discussed.
Figure 1 below shows a summary of the analysis for the afternoon of 14 December 2013. Around this time a major storm system was moving up the eastern seaboard dumping heavy rain and snow along I-95. Twitter commentary indicating locally snowy conditions is displayed in blue, while commentary indicating rainy conditions is displayed in green. The 'rain/snow' line that extended from New York City down towards Philadelphia and Washington DC is clearly visible. There are some anomalies (like the blue in southern CA and FL, but the snow noise is small relative to the signal coming out of the northeast).
Figure 2 below animates this data for that same evening. You'll notice that the area around New York City transfers from blue to green in the late evening as the rain/snow line pushed in off the Atlantic. Indeed as the 11pm news came on, local network's roving reporters were out in the cold sending back their live reports about the storm's changeover to rain.
Earlier last week, we used the same language analysis techniques to track the system as it moved across the midwest. Figure 3 shows a snapshot of National Weather Service radar from the afternoon of 13 December 2013. The red arrows indicate the direction of the leading edge of the system, which at this point was dumping mostly snow.
Figure 4 below shows an animation of our Twitter analysis on this same afternoon. You can clearly see a volume of tweets discussing snow migrating in a northeasterly direction along with the advancing weather front. Further south, 'green' tweets can be seen discussing the heavy rains falling in the area.
We'll admit it, it's fun to play around with data like this. It's of course not lost on us that weather radar is probably a far more accurate means of monitoring precipitation. However, analytics research like this is very valuable in further honing the skills of our data scientists for studying complex operational issues. For example, instead of weather conditions the analysis might be scanning incoming IT incident tickets to identify common themes and root causes. In respect of client confidentiality we usually can't discuss such projects in great detail on this site, but we'll continue to share interesting analytics snapshots from publicly available datasets.