language-icon Old Web
English
Sign In

A course on big data analytics

2018 
Abstract This report details a course on big data analytics designed for undergraduate junior and senior computer science students. The course is heavily focused on projects and writing code for big data processing. It is designed to help students learn parallel and distributed computing frameworks and techniques commonly used in industry. The curriculum includes a progression of projects requiring increasingly sophisticated big data processing ranging from data preprocessing with Linux tools, distributed processing with Hadoop MapReduce and Spark, and database queries with Hive and Google’s BigQuery. We discuss hardware infrastructure and experimentally evaluate the cost/benefit of an on-premise server versus Amazon’s Elastic MapReduce. Finally, we showcase outcomes of our course in terms of student engagement and anonymous student feedback.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    10
    Citations
    NaN
    KQI
    []