Statistical Feature Learning Methods and Software for the Boston Lung Cancer Survival Cohort (BLCSC)
The Boston Lung Cancer Survival Cohort (BLCSC) study is a cancer epidemiology cohort of 11,164 lung cancer cases, enrolled at the Massachusetts General Hospital (MGH), the Dana-Farber Cancer Institute (DFCI), and the Brigham and Women's Hospital since 1992. Dr. David C. Christiani (Harvard TH Chan School of Public Health) is the project director of the BLCSC study, which has collected detailed demographic, smoking, occupational, dietary information, in addition to pathology, radiomics, treatments history, oncogenic mutation status, serum, white blood cells, DNA, and tumor tissues.
Our primary goal is to inform the cancer community of predictive or prognostic markers that are critical to the precision medicine.
University of Michigan: Yi Li, Jian Kang, Yanming Li, Kevin He, Zhe Fei
Michigan State University: Hyokyoung G. Hong
Harvard University: David C. Christiani and his lab
- By incorporating the inter-feature dependence, a covariance-insured screening approach is proposed to identify predictors that are jointly informative but marginally weakly associated with outcomes.
- paper R code
- Leveraging prior grouping information on covariates, the partition-based screening methods for ultrahigh-dimensional variables is proposed in the framework of generalized linear models.
Figure. Combined partition-based screening statistics are shown on seven axial slices that cut through eight important brain regions, which have more than 60 selected voxels.
- paper [R code]
- This method aims to incorporate weak signals in variable selection, estimation, and prediction.
Figure. The role of weak but jointly important variables in distinguishing normal kidney (C, circle) and acute rejection (AR, triangle) in kidney transplant study.
- paper [R code]
- The Lq-norm learning is proposed to detect predictors with various levels of impact, such as short- or long-term impact, on censored outcome.
Figure. Comparisons of the Cramer–von Mises and Kolmogorov screening statistics in two hypothetical scenarios.
- With a flexible weighting scheme, Kolmogorov statistic as a special case, IPOD method can detect early or late impact on censored outcome.
- paper R code
- The recently developed variable screening methods, though powerful in many practical setting, are less powerful in detecting marginally weak while jointly important signals. A new conditional screening method for survival outcome data computes the marginal contribution of each biomarker given priorly known biological information.
- paper R code
Contact us on hhong@msu.edu for any comments or issues.