Pull data using download_and_filter.sh from GHArchive
Filter Data using filter_spark_events_v3.py
Big Data Analytics, 7th semester 10 mark mini-project. 6 month Analysis of Apache Spark GitHub activity using GHArchive data.
Fixed {Issue: 7th month bleed-over}
Open venv Install Requirements Run python3 spark_6month_analysis.py
Data Loading
Data Combination and final data processing
Contributor Analysis
Event Type Analysis
COntributor Growth Analysis
Temporal Patterns
Executive Analysis
Final Visualizations