Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optical and transport data as elemental pseudo-inverse contributions #892

Merged
merged 52 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
a9e0c5d
Added optical spectra from refractiveindex.info as composite features
gbrunin Jun 28, 2022
0a827f2
Add mode as statistical feature, fixed typo in element name
gbrunin Jun 28, 2022
3ffcf61
Added the possibility to change the frequency range.
gbrunin Jun 30, 2022
bc7ad2a
Added pure and pseudo-inversed transport properties from Ricci et al
gbrunin Jul 4, 2022
f957b10
Updated defaults and added tests for transport features
gbrunin Jul 4, 2022
49b63f6
Slight update to get rid of useless from_preset
gbrunin Aug 4, 2022
9ba7f47
Merge branch 'main' of github.com:hackingmaterials/matminer into optics
gbrunin Aug 18, 2022
cd57b82
Moved optical and transport featurizer inside ElementProperty, update…
gbrunin Sep 7, 2022
d6c36fd
Merge branch 'main' of github.com:hackingmaterials/matminer into optics
gbrunin Sep 7, 2022
0fdf9aa
Removed print, used pre-commit
gbrunin Sep 7, 2022
79fb8e7
Updated documentation in utils
gbrunin Oct 12, 2022
60afc58
Updated documentation in utils
gbrunin Oct 12, 2022
b6662e4
Removed calls to system executables
gbrunin Oct 12, 2022
14b8cbc
Use a folder to save csv files and untar optical database
gbrunin Oct 12, 2022
1d110f1
Alpha was not correctly used
gbrunin Oct 12, 2022
ac8b68c
f-strings
gbrunin Oct 12, 2022
60da2db
Added citations
gbrunin Oct 13, 2022
85ffaab
Merge pull request #1 from gbrunin/optics
gbrunin Nov 18, 2022
1cc01f1
Added script to create optical database from the refractiveinfo repo
gbrunin Nov 18, 2022
2ac092f
Rm requirements
gbrunin Nov 18, 2022
fc1ee35
Merge pull request #2 from Matgenix/optics
gbrunin Nov 18, 2022
e89d850
Added transport data for pure oxygen. Took the most stable phase of t…
gbrunin Nov 29, 2022
87afe93
Merge branch 'main' of github.com:Matgenix/matminer into optics
gbrunin Nov 29, 2022
1a7b312
Merge pull request #3 from gbrunin/optics
gbrunin Nov 29, 2022
cb56908
Make sure the index of the dataframe to be pseudo-inversed contains t…
gbrunin Dec 2, 2022
86303d5
Merge branch 'main' of github.com:Matgenix/matminer into optics
gbrunin Dec 2, 2022
947821e
Small bug fix
gbrunin Feb 24, 2023
9c2f096
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Apr 19, 2023
6277c0c
Updated isort version.
gbrunin May 30, 2023
a6650c1
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin May 30, 2023
db64499
Merge branch 'main' of github.com:hackingmaterials/matminer into optics
gbrunin May 30, 2023
ba8812f
Merge branch 'hackingmaterials:main' into main
gbrunin May 30, 2023
d5af4b1
Merge branch 'main' of github.com:Matgenix/matminer into optics
gbrunin May 30, 2023
3ca5c2c
Updated some tests.
gbrunin May 30, 2023
de1b22b
Handling the NaNs output by Composition featurizers. Tests need to be…
gbrunin Jul 4, 2023
f42bb61
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Jul 4, 2023
dc2d7a9
Merge branch 'main' of github.com:hackingmaterials/matminer into elem…
gbrunin Jul 4, 2023
79c17c2
Merge branch 'main' of github.com:Matgenix/matminer into elem_nans
gbrunin Jul 18, 2023
7f4c896
Merge pull request #5 from Matgenix/elem_nans
gbrunin Jul 18, 2023
d90f83e
Merge branch 'main' of github.com:hackingmaterials/matminer into elem…
gbrunin Oct 4, 2023
66d905b
Merge branch 'main' of github.com:Matgenix/matminer into elem_nans
gbrunin Oct 4, 2023
57042c7
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Nov 14, 2023
ba0dcc7
Fix yaml safe load in optical data.
gbrunin Nov 14, 2023
414bd19
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Mar 29, 2024
f574ced
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Apr 2, 2024
ac3c28c
Reverted default of impute_nan to False. Advertise and recommend the …
gbrunin Apr 3, 2024
e8be676
Fix for Z=119 issue.
gbrunin Apr 4, 2024
507b741
Added tests for impute_nan. Some bug fixes.
gbrunin Apr 9, 2024
8b9b7ae
Compressed MP transport data.
gbrunin Apr 9, 2024
f4ea31d
Ignore the transport csv file with git.
gbrunin Apr 9, 2024
2eae594
Merge branch 'main' of github.com:hackingmaterials/matminer into main
gbrunin Apr 9, 2024
bd5d213
Missed impute_nan warnings.
gbrunin Apr 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,4 @@ dataset_prep.log

# Datasets
matminer/datasets/*.json.gz
matminer/utils/data_files/mp_transport/transport_database.csv
204 changes: 204 additions & 0 deletions dev_scripts/dataset_management/refind2matminer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
import os
import glob

"""
Script used to transform the data from the original repository to
an easier-to-use database for matminer. Very ugly but should work,
at least with the commit 5a22ce0a3fa41fe6e9d26330686b32d83388e760 of
https://github.com/polyanskiy/refractiveindex.info-database

Place this script in your refractiveindex repo, and run it.
A folder database_matminer should appear containing the new database.

We use only files from main and other/semiconductors alloys, other/alloys and other/intermetallics.
Within these data, some files are not used because they are outliers when looking at
plots of properties vs composition.
Some files are renamed to represent the composition in an cleaner way.

Overall, this is very ugly and represents manual manipulations.
This is just added so that anyone can update the database used by matminer
with new data added on the refractiveindex repository.
Add only data that you are sure is representative of 3D crystals, there are no automatic checks!
"""

os.system("cp -r database database_copy")
os.mkdir("database_matminer")

# Get the list of files to remove
files_to_remove = [
"data/main/Al2O3/Querry.yml",
"data/main/Au/*-*nm*.yml",
"data/main/C/Arakawa.yml",
"data/main/C/Djurisic-e.yml",
"data/main/C/Djurisic-o.yml",
"data/main/C/El-Sayed.yml",
"data/main/C/Ermolaev.yml",
"data/main/C/Hagemann.yml",
"data/main/C/Querry*.yml",
"data/main/C/Song*.yml",
"data/main/C/Weber.yml",
"data/main/Ge/*nm*.yml",
"data/main/InSb/Adachi.yml",
"data/main/MoS2/Beal.yml",
"data/main/MoS2/Ermolaev*.yml",
"data/main/MoS2/Hsu*.yml",
"data/main/MoS2/Islam*.yml",
"data/main/MoS2/Jung.yml",
"data/main/MoS2/Song-*L.yml",
"data/main/MoS2/Yim*.yml",
"data/main/MoS2/Zhang.yml",
"data/main/Na/Inagaki-liquid.yml",
"data/main/Bi2Se3/Fang-2L.yml",
"data/main/Pb/Mathewson-140K.yml",
"data/main/Si/Pierce.yml",
"data/main/WSe2/*L*.yml",
"data/main/*/*Werner*.yml",
"data/other/alloys/Ni-Fe/*10nm*",
"data/other/alloys/Ni-Fe/*gold150nm*",
"data/other/intermetallics/AuAl2/Supansomboon.yml",
"data/main/*/Munkhbat-e.yml",
"data/main/*/Munkhbat-gamma.yml"
]

flat_list = []
for f in files_to_remove:
flat_list += glob.glob(os.path.join("database_copy", f))

flat_list += ["\'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-0.yml\'"]
flat_list += ["\'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-0.yml\'"]
flat_list.remove("database_copy/data/main/Ta/Werner.yml")
flat_list.remove("database_copy/data/main/Ta/Werner-DFT.yml")

files_to_remove = flat_list

for f in files_to_remove:
os.system("rm " + f)

# Also change a file
os.system("sed -i \'s/760/780/g\' database_copy/data/main/C/Peter.yml")

# Move some files
os.system("mv database_copy/data/other/alloys/Au-Ag/Rioux-Au0Ag100.yml database_copy/data/main/Ag/Rioux.yml")
os.system("mv database_copy/data/other/alloys/Au-Ag/Rioux-Au100Ag0.yml database_copy/data/main/Au/Rioux.yml")
os.system("mv \'database_copy/data/other/semiconductor alloys/AlSb-GaSb/Ferrini-0.yml\' database_copy/data/main/GaSb/Ferrini.yml")

# Remove entire folders
folders_to_remove = [
"data/other/alloys/Pd-H",
"data/other/alloys/V-H",
"data/other/alloys/Zr-H",
"data/other/alloys/Nb-Sn",
"data/other/alloys/V-Ga"
]

for f in folders_to_remove:
os.system("rm -rf database_copy/" + f)

# Now the database has been filtered.
# We move the files into database_matminer,
# with some renaming...
os.system("cp -r database_copy/data/main/* database_matminer/")
os.system("cp -r database_copy/data/other/intermetallics/* database_matminer/")
os.mkdir("database_matminer/Au2Ag3")
os.mkdir("database_matminer/Au3Ag2")
os.mkdir("database_matminer/Au3Ag7")
os.mkdir("database_matminer/Au4Ag")
os.mkdir("database_matminer/Au7Ag3")
os.mkdir("database_matminer/Au9Ag")
os.mkdir("database_matminer/AuAg")
os.mkdir("database_matminer/AuAg4")
os.mkdir("database_matminer/AuAg9")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au40Ag60.yml database_matminer/Au2Ag3/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au60Ag40.yml database_matminer/Au3Ag2/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au30Ag70.yml database_matminer/Au3Ag7/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au80Ag20.yml database_matminer/Au4Ag/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au70Ag30.yml database_matminer/Au7Ag3/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au90Ag10.yml database_matminer/Au9Ag/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au50Ag50.yml database_matminer/AuAg/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au20Ag80.yml database_matminer/AuAg4/")
os.system("cp database_copy/data/other/alloys/Au-Ag/Rioux-Au10Ag90.yml database_matminer/AuAg9/")

os.mkdir("database_matminer/Cu17Zn3")
os.mkdir("database_matminer/Cu7Zn3")
os.mkdir("database_matminer/Cu9Zn")
os.system("cp database_copy/data/other/alloys/Cu-Zn/Querry-Cu85Zn15.yml database_matminer/Cu17Zn3/")
os.system("cp database_copy/data/other/alloys/Cu-Zn/Querry-Cu70Zn30.yml database_matminer/Cu7Zn3/")
os.system("cp database_copy/data/other/alloys/Cu-Zn/Querry-Cu90Zn10.yml database_matminer/Cu9Zn/")

os.mkdir("database_matminer/Ni4Fe")
os.system("cp database_copy/data/other/alloys/Ni-Fe/Tikuisis_bare150nm.yml database_matminer/Ni4Fe/")


os.mkdir("database_matminer/Al198Ga802As1000")
os.mkdir("database_matminer/Al219Ga781As1000")
os.mkdir("database_matminer/Al315Ga685As1000")
os.mkdir("database_matminer/Al342Ga658As1000")
os.mkdir("database_matminer/Al411Ga589As1000")
os.mkdir("database_matminer/Al419Ga581As1000")
os.mkdir("database_matminer/Al452Ga548As1000")
os.mkdir("database_matminer/Al491Ga509As1000")
os.mkdir("database_matminer/Al590Ga410As1000")
os.mkdir("database_matminer/Al700Ga300As1000")
os.mkdir("database_matminer/Al804Ga196As1000")
os.mkdir("database_matminer/Al97Ga903As1000")
os.mkdir("database_matminer/Al99Ga901As1000")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-19.8.yml\' database_matminer/Al198Ga802As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-21.9.yml\' database_matminer/Al219Ga781As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Adachi-0.315.yml\' database_matminer/Al315Ga685As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-31.5.yml\' database_matminer/Al315Ga685As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-34.2.yml\' database_matminer/Al342Ga658As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-41.1.yml\' database_matminer/Al411Ga589As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-41.9.yml\' database_matminer/Al419Ga581As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-45.2.yml\' database_matminer/Al452Ga548As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-49.1.yml\' database_matminer/Al491Ga509As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-59.0.yml\' database_matminer/Al590Ga410As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Adachi-0.700.yml\' database_matminer/Al700Ga300As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-70.0.yml\' database_matminer/Al700Ga300As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-80.4.yml\' database_matminer/Al804Ga196As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Papatryfonos-9.7.yml\' database_matminer/Al97Ga903As1000/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlAs-GaAs/Aspnes-9.9.yml\' database_matminer/Al99Ga901As1000/")

os.mkdir("database_matminer/Al2329O2612N588")
os.mkdir("database_matminer/Al2351O2547N653")
os.mkdir("database_matminer/Al2356O2531N669")
os.mkdir("database_matminer/Al2372O2483N717")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlN-Al2O3/Hartnett-5.88.yml\' database_matminer/Al2329O2612N588/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlN-Al2O3/Hartnett-6.53.yml\' database_matminer/Al2351O2547N653/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlN-Al2O3/Hartnett-6.69.yml\' database_matminer/Al2356O2531N669/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlN-Al2O3/Hartnett-7.17.yml\' database_matminer/Al2372O2483N717/")

os.mkdir("database_matminer/Al10Ga90Sb100")
os.mkdir("database_matminer/Al30Ga70Sb100")
os.mkdir("database_matminer/Al50Ga50Sb100")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlSb-GaSb/Ferrini-10.yml\' database_matminer/Al10Ga90Sb100/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlSb-GaSb/Ferrini-30.yml\' database_matminer/Al30Ga70Sb100/")
os.system("cp \'database_copy/data/other/semiconductor alloys/AlSb-GaSb/Ferrini-50.yml\' database_matminer/Al50Ga50Sb100/")

os.mkdir("database_matminer/In52Ga48As100")
os.system("cp \'database_copy/data/other/semiconductor alloys/GaAs-InAs/Adachi.yml\' database_matminer/In52Ga48As100/")

os.mkdir("database_matminer/In52Ga48As24P76")
os.system("cp \'database_copy/data/other/semiconductor alloys/GaAs-InAs-GaP-InP/Adachi.yml\' database_matminer/In52Ga48As24P76/")

os.mkdir("database_matminer/Ga51In49P100")
os.system("cp \'database_copy/data/other/semiconductor alloys/GaP-InP/Schubert.yml\' database_matminer/Ga51In49P100/")

os.mkdir("database_matminer/Si11Ge89")
os.mkdir("database_matminer/Si20Ge80")
os.mkdir("database_matminer/Si28Ge72")
os.mkdir("database_matminer/Si47Ge53")
os.mkdir("database_matminer/Si48Ge52")
os.mkdir("database_matminer/Si65Ge35")
os.mkdir("database_matminer/Si85Ge15")
os.mkdir("database_matminer/Si98Ge2")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-11.yml\' database_matminer/Si11Ge89/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-20.yml\' database_matminer/Si20Ge80/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-28.yml\' database_matminer/Si28Ge72/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-47.yml\' database_matminer/Si47Ge53/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-48.yml\' database_matminer/Si48Ge52/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-65.yml\' database_matminer/Si65Ge35/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-85.yml\' database_matminer/Si85Ge15/")
os.system("cp \'database_copy/data/other/semiconductor alloys/Si-Ge/Jellison-98.yml\' database_matminer/Si98Ge2/")

# Remove unncessary folders and files
os.system("rm -r database_copy")