Literature_mining_of_contaminants_in_food_groups

Literature mining of 96 contaminants in 72 foods from PubMed. This work is part of DIETxPOSOME project of FOODinteract Research Team. The code used in this repository was adapted from FoodMine.

From data to insight: Exploring contaminants in different food groups with literature mining and machine learning techniques

Abstract

Food remains a major source of human exposure to chemical contaminants that are unintentionally present in commodities globally, despite strict regulation. Scientific literature is a valuable source of quantification data on those contaminants in various foods, but manually summarizing the information is not practicable. In this review, literature mining and machine learning techniques were applied in 72 foods to obtain relevant information on 96 contaminants, including heavy metals, polychlorinated biphenyls, dioxins, furans, polycyclic aromatic hydrocarbons (PAHs), pesticides, mycotoxins, and heterocyclic aromatic amines (HAAs). The 11,723 data points collected from 254 papers from the last two decades were then used to identify the patterns of contaminants distribution. Considering contaminant categories, metals were the most studied globally, followed by PAHs, mycotoxins, pesticides, and HAAs. As for geographical region, the distribution was uneven, with Europe and Asia having the highest number of studies, followed by North and South America, Africa and Oceania. Regarding food groups, all contained metals, while PAHs were found in seven out of 12 groups. Mycotoxins were found in six groups, and pesticides in almost all except meat, eggs, and vegetable oils. HAAs appeared in only three food groups, with fish and seafood reporting the highest levels. The median concentrations of contaminants varied across food groups, with citrinin having the highest median value. The information gathered is highly relevant to explore, establish connections, and identify patterns between diverse datasets, aiming at a comprehensive view of food contamination.

Setup

See Setup section from FoodMine

Run the python notebooks in the following order to replicate the process.

Paper_Screening_CRFS.ipynb
Paper_Ranking_CRFS.ipynb

Files

Paper_Screening_CRFS.ipynb

Notebook to search through the PubMed database and filter out search results.

Paper_Ranking_CRFS.ipynb

Notebook to build a Machine Learning model to classify potential useful papers to later review manually.

Data

Folder that holds raw data from paper data collection.

Ranking

Folder that holds raw data from paper after Machine Learning classification.

Other files

pubmed_util.py Holds functions to interact with PubMed API for the purposes of our research.
filter.py Contains class Filter to filter out PubMed search results.

Select data files:

(Apple/Cheese/Chicken/Corn_oil/Ginger/Mussel/Peanut/Potato/Soybean/Tomato/Wheat)_scoring.xlsx Files that contain papers with potentially useful classification (0/1), used to train Machine Learning algorithm
dict_CRFS.pkl File that contains food name, food scientific name, and chemical compounds included in the study

Authors

Zita Martins - feel free to contact me!

Acknowledgments

FoodMine

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data		Data
Ranking		Ranking
Apple_scoring.xlsx		Apple_scoring.xlsx
Cheese_scoring.xlsx		Cheese_scoring.xlsx
Chicken_scoring.xlsx		Chicken_scoring.xlsx
Corn_oil_scoring.xlsx		Corn_oil_scoring.xlsx
DIETxPosome.jpg		DIETxPosome.jpg
Ginger_scoring.xlsx		Ginger_scoring.xlsx
Mussel_scoring.xlsx		Mussel_scoring.xlsx
Paper_Ranking_CRFS.ipynb		Paper_Ranking_CRFS.ipynb
Paper_Screening_CRFS.ipynb		Paper_Screening_CRFS.ipynb
Peanut_scoring.xlsx		Peanut_scoring.xlsx
Potato_scoring.xlsx		Potato_scoring.xlsx
README.md		README.md
Soybean_scoring.xlsx		Soybean_scoring.xlsx
Tomato_scoring.xlsx		Tomato_scoring.xlsx
Wheat_scoring.xlsx		Wheat_scoring.xlsx
dict_CRFS.pkl		dict_CRFS.pkl
filter_CRFS.py		filter_CRFS.py
pubmed_util_CRFS.py		pubmed_util_CRFS.py

I3ALAQV/Literature_mining_of_contaminants_in_food_groups

Folders and files

Latest commit

History

Repository files navigation

Literature_mining_of_contaminants_in_food_groups

Setup

Files

Paper_Screening_CRFS.ipynb

Paper_Ranking_CRFS.ipynb

Data

Ranking

Other files

Authors

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages