Effective Analysis of Tweets Using Hadoop Ecosystem

2021 
Twitter has gained enough popularity nowadays and collecting people’s emotion, opinion, suggestion, feeling, knowledge and current market trends in the form of post on day-by-day basis from different countries, in multiple formats and languages; it is an absolute form of unstructured, rapidly growing million dollar worth data that is difficult to manage and process. This kind of data is mainly referred to as big data. The Hadoop ecosystem evolved around this problem space and offered effective management of this kind of data starting from capturing through processing till workflow management. This research is mainly aimed to provide an effective well-scalable framework to collect, process and analyze tweets using the Hadoop ecosystem. Here, Apache Flume is used to capture and store data in HDFS, Apache Pig and Apache Hive are used for data processing and analysis, and Apache Oozie is used for workflow management and task scheduling. This research also did the performance benchmarking over Hive and Pig on these data to find the recent trends, top influencers and top posts in various data categories. Experimental research concluded that Apache Pig outperformed over Apache Hive in terms of processing time while analytics results were same.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []