Lecture Location and Time
Room 307, Tongbo Building(通博楼). Period 10 to Period 11, every Tuesday in the evening from Week 1 to Week 17.
Syllabus
- W1: Introduction to Big Data Processing
- W2: Data Replication
- W3: Data Partitioning
- W4: Hadoop + HDFS
- W5: Hadoop + HDFS
- W6: Resource Manager
- W7: Columned-oriented Database
- W8: Hive + HBase
- W9: Students’ Presentations
- W10: In-memory KV Database
- W11: Spark + Spark SQL
- W12: Spark + Spark SQL
- W13: Spark + Spark SQL
- W14: Algorithms for Big Data
- W15: Algorithms for Big Data
- W16: Algorithms for Big Data
- W17: Students’ Presentations for Final Projects
Final Project
Choose anyone from the followings:
1. Paper Reviews
At least 5 papers from top-tier conferences or journals in recent 5 years.
2. Compute Cumulative Sum
The cumulative sum
(or prefix-sum) operator takes an array \(a_1, a_2, \dots, a_n\) and returns an array \(s_1, s_2, \dots, s_n\) where \(s_i = \sum_{j \leq i}a_j\). For example starting with array 17 0 5 32, it returns 17 17 22 54.
Describe how to implement cumulative sum
in MapReduce, and implement your idea with either Spark
or Hadoop
.