To work out data processing solutions for scalability testing with large dataset
- Master for Shane Siyuan Lai's Code
- Usama-mongo for Usama's Code
- swarm_spark_hdfs for Tatiana Piskunova's Code
We used the dataset containing 100 million anonymous movie ratings released by Netflix in 2006 Dataset. The date of each rating and the title and year of release for each movie id are also provided.