A Machine Learning Based Project to predict the Cover Type of a Tree based on their respective geographic areas.
Problem Statement
- The data about the tree observations are from 4 different areas in Roosevalut National Forest in Colorado
- Observation are cartographic variable (relating to maps) from 30x30 meter sectionos of forest
- We are given 10 continuous variables, 44 one hot encoded columns with which we are given to predict the
Cover_Type(Target Variable) of a Tree.
Process Followed
- Cleaning and Formating the dataset
- Performing Exporatory Data Analysis and infering from various plots
- Feature Engineering
- Comparing and Testing Various Models on the dataset to predict the
Cover_Type
Conclusion Upon testing several models and infering from their results, by cross-validation and also fine-tuning the models using grid-search. We are able to conclude that, for our dataset the best models ranked according to F1 Score are -
- Extra Trees Classifier
- Random Forest Classifier
- LGBM (Lighht Gradient Boosting Machine)
- XGBoost
- Gradient Boosting
- KNN (K-Nearest Neighbour Classifer)
- Decision Tree
- SVM (Support Vector Machines)
- Naive Bayes
- Logestic Regression