Skip to content

brunomnsilva/EMNIST-Character-Recognition

Repository files navigation

EMNIST Character Recognition

Language: Java License: MIT

This repository demonstrates the use of the UbiquitousNeuralNetworks Java library to train and deploy Multilayer Perceptron (MLP) models for recognizing handwritten digits and letters from the EMNIST dataset.

🧠 Overview

This project includes:

  • Programs to train MLP models using EMNIST (digits 🔢 or letters 🔠);
  • Interactive programs that let users draw a character ✏️, and the trained MLP recognizes it 🤔💭.

📦 Requirements and dependencies

  • Java 11+
  • UbiquitousNeuralNetworks library, via maven dependency.
  • OpenCSV library, via maven dependency.
  • EMNIST dataset (optional)
    • The dataset should be downloaded if you wish to train your own models.
    • You should extract the digit and letters datasets (csv files) into the proper folder structure, presented below.

🧩 Project Structure

├── dataset/                            # EMNIST dataset files folder
│   ├── digits/                         
│   │   ├─ emnist-digits-train.csv      # EMNIST digits train dataset (optional)
│   │   └─ emnist-digits-test.csv       # EMNIST digits test dataset (optional)
│   └── letters/                        
│       ├─ emnist-letters-train.csv     # EMNIST letters train dataset (optional)
│       └─ emnist-letters-test.csv      # EMNIST letters test dataset (optional)
├── src/                                # Source code (Java)
│   └── ...                             # Packages and programs
├── models/                             # Pre-trained models (JSON)
└── ...                                 # Other project files

✏️ Running the Recognition Programs

The repository already provides pre-trained models, so you can run the recognition programs straight away! ⚡

  1. Run the DigitsRecognizer or LettersRecognizer program;
  2. Draw a character and check the model response.
    • For each digit/letter you draw the recognition result will be displayed in the console, as depicted below.

📈 Performance of pre-trained models

The performance of the provided pre-trained models (against the EMNIST test datasets) are:

Dataset Accuracy
EMNIST Digits 98,37%
EMNIST Letters 85,31%

These models were obtained with the provided training programs.

🏋️‍♂️ Training a Model

📃 Obtaining and converting datasets

  1. You'll need to download the EMNIST dataset from Kaggle.
    • ⚠️ This is a ~ 1.2GB zip archive.
  2. Extract the relevant files into the project structure, as depicted in the previous section.
  3. We need to convert the csv dataset files into the format used by the UbiquitousNeuralNetworks library - more information about this format can be found in the wiki.
    • Just run the EMNISTConverter program in the dataset package.
    • This will result in the creation of corresponding .data files; you can delete the .csv files afterwards, if you wish.
  4. You can inspect the datasets with the DatasetInspector program in the dataset package..

🚀 Define the network structure and train a model

Once you have the datasets, you can train and test your own models 😊.

You should check the DigitsModelCreate and LettersModelCreate example programs for a full example (with testing).

An example of a minimum working code would be the following:

Dataset trainSet = new Dataset("dataset/digits/emnist-digits-train.data");
DatasetNormalization normalization = new MinMaxNormalization(trainSet);
normalization.normalize(trainSet);

MLPNetwork network = new MLPNetwork.Builder()
  .addInputLayer(trainSet.inputDimensionality())
  .addHiddenLayer(48, ReLUActivation.class, 0.1)
  .addHiddenLayer(16, ReLUActivation.class, 0.1)
  .addOutputLayer(trainSet.outputDimensionality(), SoftmaxActivation.class, 0)
  .withWeightInitializer(new HeInitializer())
  .build();

Backpropagation backpropagation = new Backpropagation.Builder(trainSet, network)
  .withLearningRate(0.001)
  .withBiasUpdate(true)
  .forNumberEpochs(20)
  .withLossFunction(CrossEntropyLoss.class)
  .build();

backpropagation.trainNetwork();

💾 Model Persistence

Models can be saved to and loaded from JSON files:

MLPNetwork model = ...;

// After training
MLPNetwork.saveJSON(model, "models/my_model.json");

// Later
MLPNetwork loadedModel = MLPNetwork.loadJSON("models/my_model.json");

This enables quick reuse of previously trained models.

📜 License

This project is released under the MIT License. See the LICENSE file for details.

🤵 Authors

Original author: Bruno Silva - (GitHub page) | (Personal page) | (🇵🇹 CIÊNCIA VITAE)

About

Train and deploy MLP models for recognizing handwritten digits and letters from the EMNIST dataset, using the Ubiquitous Neural Networks library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages