This project analyzes YouTube watch history and search history exported from Google Takeout. The goal is to transform non-tabular JSON activity data into a structured dataset and explore behavioral patterns over time using Python and data visualization.
The analysis focuses on understanding viewing habits, search behavior, and content preferences while maintaining user privacy through aggregated outputs only.
- Convert raw JSON activity logs into tabular datasets
- Analyze viewing patterns by time and date
- Identify most watched channels
- Compare activity across years
- Categorize watched content using keyword-based NLP
- Produce privacy-safe analytical outputs
Source: Google Takeout (YouTube Watch & Search History)
Raw files are excluded from this repository for privacy reasons.
- Load Google Takeout JSON exports
- Normalize nested structures into DataFrames
- Feature engineering (datetime, weekday, hour, year)
- Behavioral visualization and analysis
- Content categorization (keyword + channel fallback method)
- Export aggregated, privacy-safe results
(Images generated from anonymized aggregated data.)
(Generated from the analyzed dataset)
Watch records: 12,720
Search records: 3,669
Categories identified: 14+ content themes
- Viewing patterns by hour and weekday
- Top watched channels
- Year-over-year activity comparisons
- Aggregated CSV outputs
- Search Behaviour Analysis
- Year-over-year activity comparisons
- Content Theme Analysis
- Aggregated CSV outputs
- Place Takeout JSON files inside
data/ - Open
/notebooks/youtube_behavior_analysis.ipynb - Run all cells from top to bottom
Raw activity data is excluded using .gitignore. Only aggregated,
non-identifiable outputs are published.

