Skip to content

golecalicja/language-recognition-neural-network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language recognition neural network

Table of contents

Introduction

A single-layer neural network written from scratch that predicts the language of the text based on the proportions of letters.

Data description

Articles scraped from Wikipedia. The neural network was trained on 5 languages (English, German, Polish, Spanish and French), 10 articles each. However, the code is generic and can be applied to any number of languages.

Examples of performance

English sample:

image

German sample:

image

Polish sample:

image

Methods used

  • Dynamic reading of multiple data files from a various number of directories
  • Calculating vectors of letter proportions in each language (ASCII characters only)
  • Implementing a single-layer neural network of k perceptrons (where k = number of languages) from scratch
  • Libraries used for data loading and preprocessing: pandas, numpy
  • OOP and clean code
  • Unit testing with pytest