Skip to content

HSE-LAMBDA/IDAO-2022

Repository files navigation

Problem statement

Two-dimensional transition metal dichalcogenides (TMDCs) are relatively new types of materials that have remarkable properties ranging from semiconducting, metallic, magnetic, superconducting to optical. The chemical composition of TMDCs is MX₂; where M is the group of transition elements most popular Molybdenum and Tungsten, and X is usually Sulfur or Selenium. Atomically thin TMDCs usually contain various defects, which enrich the lattice structure and give rise to many intriguing properties. Engineered point defects in two-dimensional (2D) materials offer an attractive platform for solid-state devices that exploit tailored optoelectronic, quantum emission, and resistive properties. Naturally occurring defects are also unavoidably important contributors to material properties and performance. The immense variety and complexity of possible defects make it challenging to experimentally control, probe, or understand atomic-scale defect-property relationships. In the figure above you can find vacancy and substitution defects in an 8x8 MoS₂ crystal lattice.

Band gap is one of the important physical attributes which describe certain characteristics of the material, that helps deriving material qualities including electric conductivity or catalytic power or photo-optical properties. Band gap is the energy difference between the valence band and conduction band and is closely related to the energy difference between highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO), materials with overlapping (between valence band and conduction band) or very small band gap are conductors and materials with small bandgap are semiconductors while materials with large bandgap are insulators.

Objective

The task is to predict band gap energy for each crystal structure.

Data

The training dataset is in the data directory in the baseline and structured into a directory called structures containing 2967 crystal structures as a json file named with a unique identifier and is containing a special pymatgen structure (check pymatgen documentation for reference), that contains information about crystal parameters, cartesian coordinates of each atom, atom types, and other information. The targets are stored in a csv file named targets.csv containing two columns; the first is the unique identifier of the structure and the other is the band gap value for each structure. The train and test sets are constructed by sampling the corresponding subset without replacement.

Quality Metric

Energy within Threshold (EwT) is designed to measure the practical usefulness of a model for replacing DFT by evaluating whether the predicted energy is close to the ground truth (DFT energy). EwT is defined as the fraction of structures in which the predicted energy is within ε = 0.02 eV (electronvolt) of the ground truth energy.

where is the number of samples in the dataset indexed by .

Baseline

The baseline is based on MEGNet, a message passing graph neural network designed for materials by incorporating the natural symmetries in crystals such as rotation invariance, and periodic boundary conditions (PBC). In the reference baseline MEGNet is trained on the dataset.

Constraints

  1. Resource constraints: The solution you submit will run with resource constraints of 1 CPU and 2 Gb RAM. The time limit is 15 minutes. Note that the running time on your machine and on Yandex.Contest servers could be different due to different hardware.
  2. DFT is not allowed; and, its use will disqualify the modeling solution from the competition

Notes

You may use additional information, data, and advice at your own risk. All solutions will be checked by the jury to guarantee fair play. Please ask us in any vague or unclear case. If you face any problems or have questions, please contact us via e-mail: hello@idao.world

References