This project explores the key factors influencing startup success using statistical analysis in R. It applies multivariate techniques to understand patterns in funding, growth, and operational behavior across different sectors.
- Analyze startup data to identify key growth drivers
- Examine whether startup sectors differ significantly
- Reduce multiple financial variables into meaningful components
- Apply statistical testing to validate insights
The dataset contains information about startups including:
- Company Name
- Sector
- Funding Amount
- Investor
- Year
- City
Additional features were engineered to enhance analysis:
- Revenue
- Employee Count
- Marketing Spend
- Operational Years
- Renamed columns for consistency
- Converted categorical variables (Sector → factor)
- Removed missing values
- Created realistic business metrics using statistical assumptions
- Revenue, Employees, Marketing Spend derived from Funding
- Operational Years calculated from founding year
- Boxplots to compare funding across sectors
- Histograms to study distribution of revenue
- Correlation matrix to analyze relationships between variables
- Used Mardia’s Test
- Checked skewness and kurtosis of financial variables
📌 Result:
- Data shows deviation from perfect normality (common in real-world datasets)
- Reduced multiple correlated variables into principal components
- Standardized data before applying PCA
📌 Key Components:
- PC1 → Company Scale (Funding, Revenue, Employees)
- PC2 → Operational Intensity
- Tested whether Company Scale differs across sectors
📌 Result:
- No statistically significant difference (p > 0.05)
- Indicates startup growth is not strongly sector-dependent
- Startup success is multi-dimensional, not driven by a single factor
- Sector alone does not significantly impact company scale
- Financial and operational variables are strongly correlated
- PCA effectively reduces complexity into meaningful indicators
- R
- ggplot2
- dplyr
- GGally
- MVN
- factoextra
- Open project in RStudio
- Place dataset in the working directory
- Run
analysis.Ror knitstartup_success_analyzer.Rmd
- Some variables were simulated due to dataset constraints
- Real-world startup data may show more complexity
- Multivariate normality assumption is not fully satisfied
- Use real valuation and revenue data
- Apply regression models for prediction
- Build interactive dashboards (Shiny / Power BI)
- Perform clustering for startup segmentation


