Hanacaraka-AI

This project is our final project for Google Bangkit Academy.

In these projects, we build a handwritten character recognition model that recognizes the ancient javanese alphabet (Aksara Jawa), based on a public dataset available on Kaggle. Thanks to Phiard the author of the dataset. The dataset contains twenty ancient javanese alphabet characters which are, ha, na, ca, ra, ka, da, ta, sa, wa, la, pa, dha, ja, ya, nya, ma, ga, ba, tha, nga. The characters are shown in the picture below.

Figure 1. Ancient Javanese Alphabet

We build our baseline model based on basic Convolutional Neural Network architecture (see Figure 2.) with an additional 128 fully connected neurons layer. Our baseline model produces 98% training accuracy and 88% validation accuracy.

Figure 2. Baseline Model Architecture

In order to develop a good model, we have searched several research papers and open-source projects on Handwritten Character Recognition topics as our references. Of the many we got, we chose the Arabic Character Recognition project as our model reference, it is publicly available as an open-source project on GitHub.

To the best of our knowledge, our baseline CNN model tends to overfit. In our experiment, the accuracy on the training set keeps increasing, while the accuracy on the validation set stays around 80%. Hence, we aim to create an improved model that can reduce the overfitting issue. We use batch normalization, dropout, and global average pooling to reduce overfitting. However, we find the model still has trouble maintaining the validation accuracy, so we use L2 kernel regularizer on each convolution layer, and use additional callbacks to reduce the learning rate when validation accuracy gets plateaued.

As a result, we have got more stable validation accuracy. Finally, we successfully developed a model that produces 92% training accuracy and 89% validation accuracy. Our model architecture shown in Figure 3.

Figure 2. Improved Model Architecture

During the improved model development, there was a new update on the origin dataset. Hence, we decided to change our current version of the dataset with the new version, and then re-train our improved model architecture on it. We successfully developed an impressive model that omits the overfitting issue and produces 97% training accuracy and 96% validation accuracy.

Prerequisites

Jupyter Notebook or Google Colab
Kaggle API Token
Python version 3.6 or above
Latest version of Tensorflow 2

How to use

Go to your Kaggle profile then download your Kaggle API.
- My Account --> Look for API section --> Create New API Token
You can use it from the original source and modify our code then set the dataset to the original one,
- or, you can downloaded it from our drive.
- In this script we downloaded it and reupload it to Google Drive.
- The origin source are containing many folders for each version of the dataset. We already combined all version of the dataset into single training, validation, and testing folder.
Run our baseline model on Google Colab.
Next, run our improved model.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.ipynb_checkpoints		.ipynb_checkpoints
misc/img		misc/img
model		model
Hanacaraka AI - notebook - improved model.ipynb		Hanacaraka AI - notebook - improved model.ipynb
Hanacaraka AI - notebook.ipynb		Hanacaraka AI - notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

misc/img

misc/img

model

model

Hanacaraka AI - notebook - improved model.ipynb

Hanacaraka AI - notebook - improved model.ipynb

Hanacaraka AI - notebook.ipynb

Hanacaraka AI - notebook.ipynb

README.md

README.md

Repository files navigation

Hanacaraka-AI

Prerequisites

How to use

About

Releases

Packages

Contributors 3

Languages

IqbalLx/Hanacaraka-AI

Folders and files

Latest commit

History

Repository files navigation

Hanacaraka-AI

Prerequisites

How to use

About

Topics

Resources

Stars

Watchers

Forks

Languages