Skip to content

SayamAlt/Symptoms-Disease-Text-Classification

Repository files navigation

About Dataset

The dataset consists of 1200 datapoints and has two columns: "label" and "text".

  • label : contains the disease labels
  • text : contains the natural language symptom descriptions.

The dataset comprises 24 different diseases, and each disease has 50 symptom descriptions, resulting in a total of 1200 datapoints.

The following 24 diseases have been covered in the dataset:

Psoriasis, Varicose Veins, Typhoid, Chicken pox, Impetigo, Dengue, Fungal infection, Common Cold, Pneumonia, Dimorphic Hemorrhoids, Arthritis, Acne, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Jaundice, Malaria, urinary tract infection, allergy, gastroesophageal reflux disease, drug reaction, peptic ulcer disease, diabetes

Task

The task is to develop a language model to accurately predict the disease given a short description of the symptoms faced by the user.

Such models can be used to identify potential diseases early on, allowing patients to seek medical attention and treatment promptly. Also, In situations where in-person consultations are not possible or desirable, the app can be used to provide remote diagnosis and treatment recommendations based on the user's symptoms.