This repository contains the work completed for the Applied Data Science Capstone Project, part of the IBM Data Science Professional Certificate series. The project is structured into multiple modules, each building on the previous one, culminating in a strong predictive model.
- Data Collection: The project begins with a GET request to the SpaceX API to retrieve historical launch data. This data includes various parameters essential for analysis.
- Data Cleaning: After collecting the data, cleaning and formatting were performed to remove anomalies, handle missing values, and prepare it for analysis.
- Web Scraping: Falcon 9 launch records were scraped from Wikipedia using the BeautifulSoup library, a powerful tool for web scraping in Python.
- Data Parsing: The HTML table extracted from Wikipedia was parsed and converted into a Pandas DataFrame for easier manipulation and analysis.
- Exploratory Data Analysis (EDA): In-depth EDA was conducted using visualization tools such as Matplotlib and Seaborn to uncover patterns and correlations in the data.
- Training Labels: The training data was labeled to prepare for machine learning models, which is crucial for the modeling phase.
- Loading Data into Db2: The cleaned dataset was uploaded into a Db2 database, enabling structured querying and analysis with SQL.
- Executing SQL Queries: Various SQL queries were executed to derive insights and answer specific questions related to the launch data.
- Feature Engineering: New features were created from the existing data to improve the predictive power of the models.
- Geographical Visualization with Folium: Interactive maps were generated using the Folium library to visualize launch sites and success/failure rates, helping to identify geographical patterns.
- Dash Application Development: An interactive Plotly Dash application was built to allow real-time analysis of the SpaceX launch data.
- User Interaction Features: Features like dropdown menus and range sliders were incorporated to enhance user interaction with pie charts and scatter plots, making the analysis more intuitive and engaging.
- Data Standardization and Splitting: The dataset was standardized, then split into training and testing sets to ensure robust evaluation of the models.
- Hyperparameter Tuning: Hyperparameter tuning was performed using GridSearchCV on various models, including SVM, Classification Trees, and Logistic Regression.
- Model Evaluation: Each model was evaluated using test data to determine the best-performing model for predicting SpaceX launch success.
This project allowed for the application of a wide range of data science techniques, from data collection and cleaning to advanced machine learning and interactive visualization. The final model is capable of predicting the success of SpaceX launches based on historical data, providing valuable insights for future launches.