The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream, in real-time to drive d3 visualizations using Apache Storm, the “Hadoop of Real Time.” Storm is free, open source, and fun to use! Learn from Karthik Ramasamy, Technical Lead of [email protected]
, about the distributed, fault-tolerant, and flexible technology used to power Twitter’s real-time data flow pipeline. Twitter open sourced Storm in 2011, and it graduated to a top-level Apache project in September, 2014.
Starting from basic distributed concepts presented during our first Udacity-Twitter Storm Hackathon, link Storm concepts to Storm syntax to scalably drive Word Cloud visualizations with Vagrant, Ubuntu, Maven, Flask, Redis, and d3. Link to the public Twitter gardenhose stream to process live tweets, parse embedded URLs, and calculate Top worldwide hashtags. Extend beyond Storm basics by exploring multi-language capabilities in Python, integrate open source components, and implement real-time streaming joins.
In your final project, follow real-time trending topics by implementing the data pipeline to visualize only tweets that contain Top worldwide hashtags. Extend your project by exploring the Twitter API, or any data source, alongside Hackathon participants as they design their own ideas, receive feedback from Karthik, and open source a final project calculating real-time tweet sentiment and geolocation to drive a U.S. Map.