Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 1.13 KB

memo.md

File metadata and controls

3 lines (2 loc) · 1.13 KB

Our team¡¯s progress is delayed due to the previous issue of accessing confidential data on Langone server. However, we still managed to complete some preliminary works. Based on the EHR datasets, we finished some elementary data cleaning, including dropping invalid candidates, filling in NAs. Also the team has spent great efforts in basic feature engineering. We normalized features into a scale of [0, 1], and built some handcrafted features from original ones based on logical relationship such as aggregation and binning. Finally, as the previous work suggests, we tested several simple models such as Lasso regression, and the current result is comparable to that of the research paper proposed by our collaborators.

In the next few weeks, the team will focus on feature engineering and potentially beat the best model in the research paper. Specifically, we will generate meaningful indicators with medical knowledge in mind. For example, weight*height may be helpful to identify the interaction between weights and heights. In addition, the team will try other models such as random forests in order to have a thorough comparison between models.