Skip to content

PedroCarneiroMarques/Abalone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Abalone

Project Overview/Summary

Dataset:

The dataset used for analysis contains information about abalones, a type of marine mollusk. The dataset includes various physical attributes such as length, diameter, height, and weights of different parts, along with the number of rings, which can be used to estimate the age of the abalones. Additionally, a 'Sex' column was present, which was one-hot encoded to 'Sex_I' and 'Sex_M' during the analysis.

Exploratory Data Analysis (EDA):

Basic Statistics:

Checked basic statistics of the dataset, including mean, median, and quartiles, to understand the distribution of numerical features.

Correlation Analysis:

Investigated correlations between different features to identify potential relationships.

Distribution Visualization:

Created histograms and box plots to visualize the distribution of numerical features and identify potential outliers.

Categorical Variable Exploration:

Explored the distribution of the categorical variable 'Sex' and its impact on the target variable.

Data Preprocessing:

One-Hot Encoding:

Encoded the categorical variable 'Sex' into binary columns ('Sex_I' and 'Sex_M') using one-hot encoding.

Data Visualization:

Pair Plots:

Generated pair plots to visualize relationships between numerical features.

Box Plots:

Created box plots to identify the distribution of numerical features and potential outliers.

Correlation Heatmap:

Visualized the correlation matrix using a heatmap to identify strong correlations between features.

Machine Learning Model:

Random Forest Regressor:

Trained a Random Forest Regressor using the dataset to predict the number of rings.

Model Evaluation:

Evaluated the model performance using Mean Squared Error (MSE) and visualized the predicted vs actual values.

Suggestions for Improvement:

Feature Engineering:

Consider creating new features based on domain knowledge or combining existing features.

Hyperparameter Tuning:

Conduct a more thorough hyperparameter search using techniques like RandomizedSearchCV or Bayesian optimization to find the best set of hyperparameters.

Cross-Validation:

Implement cross-validation to obtain a more robust estimate of the model's performance.

Explore Other Models:

Experiment with other ensemble methods, such as Gradient Boosting, to compare and potentially improve model performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors