GitHub - huymanhnguyen0811/Arson: This repo is a demonstration of the workflow of the paper: The Use of Computational Fingerprinting Techniques to Distinguish Sources of Accelerants Used in Wildfire Arson

Documentation

This repo is accompanying the publication: “The Use of Computational Fingerprinting Techniques to Distinguish Sources of Accelerants Used in Wildfire Arson”.

Users need to first install R with this link and Rstudio with this link.

This workflow ran on Windows 11 OS 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz, 16 GB RAM;

The RStudio version used in this demo is 2023.06.0+421 “Mountain Hydrangea” Release for Windows;

The R version used in this demo is 4.3.1

Data processing

First, the following R packages are installed and loaded in the global the environment along with in-house built functions to minimize repetitiveness in the code.

Details about these functions can be found in Data processing & Normalization.R file in this repo.

PCA

Further details on the package and PCA functions used can be found in folder demo analysis, under the following files: demo analysis_stats_RQ1 & RQ2.R and demo_analysis_PCA_RQ1 & RQ2.R

Multiple Wilcoxon tests with p-value correction for multiple testing

Research Question 1: Gas versus Diesel

Before running multiple Wilcoxon tests, it is recommended to examine whether criteria for univariate parametric tests, such as t-tests, are violated. If yes, then it is safe to proceed using non-parametric univariate test, such as Wilcoxon test.

First, equal variance and normally distributed between two populations, here are Gas and Diesel using histogram, and Q-Q plots. Here, both Gas and Diesel populations are NOT normally distributed.

GasData <- as.vector(t(cat_5[,c(1:21)])) # 100149 data points
DieselData <- as.vector(t(cat_5[,c(22:25)])) # 19076 data point

# Histogram
hist(GasData, col='steelblue', main='Gas')

hist(DieselData, col='steelblue', main='Diesel')

# Q-Q plots aka. Normal Probability plots
stats::qqnorm(GasData, main='Gas')
stats::qqline(GasData)

stats::qqnorm(DieselData, main='Diesel')
stats::qqline(DieselData)

Then, equality of variance between Gas and Diesel populations can be examined using Levene’s test and Fligner-Killeen test for non-normally distributed data. Here, for both tests, p values are < 0.05, and thus, there is significant difference in variances between Gas and Diesel populations.

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    1  34.244 5.016e-09 ***
##       9948                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  data by group
## Fligner-Killeen:med chi-squared = 531.7, df = 1, p-value < 2.2e-16

Here, multiple Wilcoxon tests followed by p-value correction for multiple testing were done. Different methods for p-value correction from function p.adjust from package stats were used. After p-value correction, the threshold for p-value can be set (for example,p.adjust < 0.05 or < 0.1), to see how it affects the number of significant compounds that can be found. Importantly, the threshold should be set so that it can include as many ASTM reference compounds in the list of significant compounds as possible. For example here, no significant compounds can be found with adjusted p-value < 0.05. But when the adjusted p-value is < 0.1, 127 significant compounds are found.

## [1] "Number of significant compounds with adjusted p-value < 0.05 = 0"

## [1] "Number of significant compounds adjusted p-value < 0.1 = 127"

Research Question 2: Distinguishing samples from groups of gas stations

Here again, before running multiple Wilcoxon tests, it is recommended to examine whether criteria for univariate parametric tests, such as t-tests, are violated. If yes, then it is safe to proceed using non-parametric univariate test, such as Wilcoxon test.

First, equal variance and normally distributed between two populations, here are Gas and Diesel using histogram, and Q-Q plots. Here, both Gas station group 1 (1, 3, 8) and Gas station group 2 (5, 7, 9) populations are NOT normally distributed.

Equality of variance between Gas station group 1 (1, 3, 8) and Gas station group 2 (5, 7, 9) populations can be examined using Levene’s test and Fligner-Killeen test for non-normally distributed data. Here, for both tests, p values are < 0.05, and thus, there is a significant difference in variances between Gas station group 1 (1, 3, 8) and Gas station group 2 (5, 7, 9) populations.

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value Pr(>F)
## group     1  0.6128 0.4338
##       11149

## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  data by group
## Fligner-Killeen:med chi-squared = 713.21, df = 1, p-value < 2.2e-16

Here, multiple Wilcoxon tests followed by p-value correction for multiple testing were done. Different methods for p-value correction from function p.adjust from package stats were used. After p-value correction, the threshold for p-value can be set (for example,p.adjust < 0.05 or < 0.1), to see how it affects the number of significant compounds that can be found. Importantly, the threshold should be set so that it can include as many ASTM reference compounds in the list of significant compounds as possible. For example here, no significant compounds can be found with adjusted p-value < 0.05. But when the adjusted p-value is < 0.1, 127 significant compounds are found.

## [1] "Number of significant compounds with adjusted p-value < 0.05 = 75"

## [1] "Number of significant compounds adjusted p-value < 0.1 = 94"

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Embedded visualizations/figure-markdown_github		Embedded visualizations/figure-markdown_github
Github-workflow-usage-instruction_files/figure-markdown_github		Github-workflow-usage-instruction_files/figure-markdown_github
Supplementary plots		Supplementary plots
demo analysis		demo analysis
python		python
.gitignore		.gitignore
ASTM Compound List.xlsx		ASTM Compound List.xlsx
ASTM_Presence_Confirmation.R		ASTM_Presence_Confirmation.R
ASTM_RTcorrection_withF001A.R		ASTM_RTcorrection_withF001A.R
Arson.Rproj		Arson.Rproj
Data processing & Normalization.R		Data processing & Normalization.R
Gasolines_BOP_110424.xlsx		Gasolines_BOP_110424.xlsx
Github workflow usage instruction.Rmd		Github workflow usage instruction.Rmd
Github-workflow-usage-instruction.md		Github-workflow-usage-instruction.md
README.md		README.md

huymanhnguyen0811/Arson

Folders and files

Latest commit

History

Repository files navigation

Documentation

Data processing

PCA

Multiple Wilcoxon tests with p-value correction for multiple testing

Research Question 1: Gas versus Diesel

Research Question 2: Distinguishing samples from groups of gas stations

About

Topics

Resources

Stars

Watchers

Forks

Languages