Exploratory Data Analysis (EDA) and Machine Learning (ML) algorithms for Perovskite datasets

This work has been done as a part of AI4D project (collaboration of researchers from University of Calgary, University of Alberta, National Research Council Canada and Federal University of Espírito Santo) and published as a conference paper at ML4Materials from Molecules to Materials, part of The International Conference on Learning Representations (ICLR) 2023.

Code contributors: Jiri Hostas (Calgary/Canada) & Maicon Pierre Lourenco (Espírito Santo/Brasil)

Other collaborators: John Garcia, Hatef Shahmohamadi, Amanda Ndubuisi, Bhavadharini Selvakumar, Thilini Boteju, Lizandra Barrios Herrera, Alain Tchagang, Mosayeb Naseri, Dennis Salahub, Venkataraman Thangadurai, and Karthik Shankar.

Documentation explains the main parts of the code and graphics highlight the key project takeaways. Data reading and cleaning have been done in a single jupyter notebook (01). The rest of the project is in another notebook (02-04).

The repository contains several subprojects which are sorted into several sections

First notebook:

Data reading and cleaning (including a custom made script to read an xlsx human made database file)

Second notebook:

Figure 1: Principal component analysis (PCA) is often used to visualize data and reduce the number of features that are used in machine learning. Here, I have started with 176 compounds (from Materials Database) which contained 28 different elements. This resulted in a weight composition descriptor with 28 different features (using site and chemical formula information). Using PCA this very sparse descriptor can be reduced and visualized. We can see that principal component 1 is highly correlated with the oxygen content which seems to explain sizeable portion of variance in the data, however after closer inspection it is only 10% of variance (or information) in the data. This is important in the further development of descriptors.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
01-Data-reading-and-cleaning		01-Data-reading-and-cleaning
02-Feature-engineering		02-Feature-engineering
03-Regression		03-Regression
04-PCA-and-WAE		04-PCA-and-WAE
Graphics		Graphics
Perovskite Database.xlsx		Perovskite Database.xlsx
README.md		README.md
iclr2023_perovskites.pdf		iclr2023_perovskites.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01-Data-reading-and-cleaning

01-Data-reading-and-cleaning

02-Feature-engineering

02-Feature-engineering

03-Regression

03-Regression

04-PCA-and-WAE

04-PCA-and-WAE

Graphics

Graphics

Perovskite Database.xlsx

Perovskite Database.xlsx

README.md

README.md

iclr2023_perovskites.pdf

iclr2023_perovskites.pdf

Repository files navigation

Exploratory Data Analysis (EDA) and Machine Learning (ML) algorithms for Perovskite datasets

The repository contains several subprojects which are sorted into several sections

About

Releases 1

Packages

Languages

jiri-hostas/EDA-and-ML-for-Perovskites

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis (EDA) and Machine Learning (ML) algorithms for Perovskite datasets

The repository contains several subprojects which are sorted into several sections

About

Topics

Resources

Stars

Watchers

Forks

Languages