A high-performance and scalable C4.5 decision tree classifier implemented in Go. This tool enables users to efficiently train a decision tree model on structured data and make accurate predictions using the trained model.
🚀 Optimized for speed, parallel execution, and large datasets.
- 🚀 Features
- ⚙️ How It Works
- 📂 Project Structure
- 📥 Installation
- 🔧 Usage
- 📜 License
- 🙌 Contributors
- 🤝 Contributing
✔ CSV Data Processing – Reads CSV files, extracts features, and identifies the target labels.
✔ Parallel Processing – Uses Go goroutines to speed up data handling and decision tree building.
✔ C4.5 Algorithm – Implements the C4.5 decision tree with entropy-based splitting and pruning.
✔ Feature Selection – Selects the best feature at each node to maximize information gain.
✔ Handles Missing Values – Uses smart imputation techniques to handle missing data.
✔ Fast Predictions – Efficiently classifies new data points using the trained decision tree model.
✔ Serialization – Saves trained models as JSON files for later use in predictions.
✔ Command-Line Interface – Simple CLI for training and predicting with decision trees.
1️⃣ Data Processing: Parses CSV files and detects headers.
2️⃣ Feature Selection: Uses entropy and information gain to find the best splits.
3️⃣ Tree Building: Recursively builds the decision tree, using pruning for efficiency.
4️⃣ Model Storage: Saves the trained decision tree in a serializable JSON format.
5️⃣ Predictions: Uses the trained tree to classify new input data.
|─ cmd/ # CLI commands and argument parsing
│ ├── root.go # CLI entry point for commands
│
├── internal/model/ # Core logic for decision tree training and predictions
│ ├── cache/ # Caches computed values for performance optimization
│ ├── counter/ # Computes class distributions (e.g., mode in a class)
│ ├── entropy/ # Calculates data uncertainty (entropy calculation)
│ ├── model/ # Trains the decision tree based on input data
│ ├── node/ # Defines tree node structure and utility functions
│ ├── parser/ # Parses CSV files and converts data into structured format
│ ├── predict/ # Uses the trained model to make predictions
│ ├── split/ # Finds the best feature split for information gain
│ ├── types/ # Defines tree structure and related data types
│ ├── utils/ # Utility functions for data preprocessing
│
├── decision_model/ # Stores serialized trained decision tree models
├── go.mod # Go module dependencies
├── go.sum # Go dependency checksums
├── LICENSE # License information
├── main.go # Application entry point
Ensure you have Go installed. 🔗 Download Go
git clone https://learn.zone01kisumu.ke/git/tesiaka/c4.5-decision-tree.git
cd c4.5-decision-treego mod tidygo build -o dtThis creates an executable dt for running commands.
| Flag | Description |
|---|---|
-c |
Train a decision tree (train) |
-i |
Input CSV file path containing the training dataset |
-t |
Name of the column in the dataset containing the target labels |
-o |
Output file to save the trained decision tree (JSON format) |
./dt -c train -i dataset.csv -t target_column -o model.dt| Flag | Description |
|---|---|
-c |
Predict command (predict) |
-i |
Input CSV file containing test data |
-m |
Path to the trained decision tree model file |
-o |
Path to save predictions as a CSV file |
./dt -c predict -i test_data.csv -m model.dt -o predictions.csvThis project is licensed under the MIT License.
🔗 MIT License
🚀 This project is open for contributions!
🔹 How to contribute:
- Fork the repository.
- Create a new branch for your feature.
- Commit your changes.
- Push the branch to your fork.
- Open a pull request.
- Submit your pull request.
- Review and merge.
- Update the documentation.
- Update the changelog.
💡 Let's build a faster and more efficient Decision Tree model together!
Developed with ❤️ in Go. 🚀