Skip to content

idblr/geo_US_lung_cancer_and_smoking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geographic Patterns in U.S. Lung Cancer Mortality and Cigarette Smoking

License GitHub last commit

Date repository last updated: June 10, 2023

Authors

  • Alaina H. Shreves1,2 - ORCID
  • Ian D. Buller3,4 - ORCID
  • Elizabeth Chase5,6 - ORCID
  • Hannah Creutzfeldt3,7 - ORCID
  • Jared A. Fisher3 - ORCID
  • Barry I. Graubard6 - ORCID
  • Robert N. Hoover8 - ORCID
  • Debra T. Silverman3 - ORCID
  • Susan S. Devesa5 - Co-Senior Author - ORCID
  • Rena R. Jones3 - Co-Senior Author & Corresponding Author - ORCID
  1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, 02115, USA
  2. Trans-Divisional Research Program, Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, MD, 20850, USA
  3. Occupational and Environmental Epidemiology Branch, DCEG, NCI, Rockville, MD, 20850, USA
  4. Cancer Prevention Fellowship Program, Division of Cancer Prevention, NCI, Rockville, MD, 20850, USA
  5. Infections and Immunology Branch, DCEG, NCI, NIH, Rockville, MD, 20850, USA
  6. Department of Biostatistics, University of Michigan School of Public Health, University of Michigan, Rockville, MD, 20850, USA
  7. Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, 90095, USA
  8. Office of the Director, DCEG, NCI, NIH, Rockville, MD, 20850, USA

Project Details

Lung cancer is the leading cause of cancer death in the United States (US) and variations in lung cancer mortality and smoking behavior are evident by sex and region. We apply geospatial statistical methods to describe patterns in lung cancer mortality rates (2005-2018) in relation to patterns in cigarette smoking prevalences (1997-2003) by sex at the US county level. Our findings identify counties where lung carcinogens other than smoking may be driving lung cancer mortality and where further study is needed.

Project Timeframe

Time Event

1997-2003

NCI Model-based Small Area Estimates of Cancer-Related Measures smoking prevalences for persons aged 18+ years (see data availability section below)

2005-2018

Lung and bronchus cancer mortality rates among persons aged 20+ years from the National Vital Statistics System data from the National Center for Health Statistics (see data availability section below)

July 2020

Project Initiation

March 2022

Initial manuscript submission to Cancer Epidemiology, Biomarkers & Prevention for peer-review

November 2022

Manuscript accepted by Cancer Epidemiology, Biomarkers & Prevention

February 2023

Manuscript published in Cancer Epidemiology, Biomarkers & Prevention

June 2023

Update to the False Discovery Rate (Benjamini & Hochberg, 1995) calculation for multiple testing correction that now orders the p-values in ascending order instead of in descending order.

R Scripts Included In This Repository

This repository includes R scripts used to calculate the Lee's L statistic and render the geographic visualizations found in the following peer-reviewed manuscript:

Shreves AH, Buller ID, Chase E, Creutzfeld H, Fisher JA, Graubard BI, Hoover RN, Silverman DT, Devesa SS, Jones RR. (2023) Geographic Patterns in U.S. Lung Cancer Mortality and Cigarette Smoking. Cancer Epidemiology, Biomarkers & Prevention, 32(2):193-201. DOI:10.1158/1055-9965.EPI-22-0253 PMID:36413442.

R Script Description

functions.R

Custom functions to calculate the local Lee's L statistic with correction for multiple testing

preparation.R

Calculate the local Lee's L statistics for the four comparisons. Requires a data set to run (not included; see notes within).

figure1.R

Generate Figure 1

figure2.R

Generate Figure 2

supplemental1.R

Generate Supplemental Figure 1

supplemental2.R

Generate Supplemental Figure 2

The repository also includes the code to create the project hexagon sticker.

Getting Started

  • Step 1: You must download the data (see Data Availability section)
  • Step 2: Save the data set to the data directory in this repository. Currently specified as a CSV file, but modify the path on Line 58 of the preparation.R file based on data location and file name
  • Step 3: Run R scripts for figures. The preparation.R file will source the functions.R file.

Data Availability

County-level U.S. lung cancer mortality rates and smoking prevalences are downloadable from Model-based Small Area Estimates of Cancer-Related Measures from the Surveillance Research Program within the Division of Cancer Control and Population Sciences of the National Cancer Institute and the National Vital Statistics System from the National Center for Health Statistics of the Centers for Disease Control and Prevention.

Questions?

For questions about the manuscript please e-mail the corresponding author Dr. Rena R. Jones.