avatar

Zhongpu Chen

PhD

Spring 2021 (Big Data Processing)

Lecture Location and Time

Room 307, Tongbo Building(通博楼). Period 10 to Period 11, every Tuesday in the evening from Week 1 to Week 17.

Syllabus

  • W1: Introducation to Big Data Processing
  • W2: Data Replication
  • W3: Data Partitioning
  • W4: Hadoop + HDFS
  • W5: Hadoop + HDFS
  • W6: Resource Manager
  • W7: Columned-oriented Database
  • W8: Hive + HBase
  • W9: Students’ Presentations
  • W10: In-memory KV Database
  • W11: Spark + Spark SQL
  • W12: Spark + Spark SQL
  • W13: Spark + Spark SQL
  • W14: Algorithms for Big Data
  • W15: Algorithms for Big Data
  • W16: Algorithms for Big Data
  • W17: Students’ Presentations for Final Projects

Final Project

Choose anyone from the followings:

1. Paper Reviews

At least 5 papers from top-tier conferences or journals in recent 5 years.

2. Compute Cumulative Sum

The cumulative sum (or prefix-sum) operator takes an array \(a_1, a_2, \dots, a_n\) and returns an array \(s_1, s_2, \dots, s_n\) where \(s_i = \sum_{j \leq i}a_j\). For example starting with array 17 0 5 32, it returns 17 17 22 54.

Describe how to implement cumulative sum in MapReduce, and implement your idea with either Spark or Hadoop.