In machine learning, feature selection is essential for improving model performance and accuracy. This project investigates various feature importance strategies, including Spearman's Rank Coefficient, Principal Component Analysis (PCA), model-based importance strategies (drop-column importance and permutation importance). The performance and effectiveness of these strategies are compared across different models to determine their suitability. Additionally, an automatic feature selection algorithm that combines multiple strategies and considers variance and empirical p-values is discussed to help identify the optimal combination of features for specific models, ultimately enhancing model performance.
Main notebook is featimp.ipynb.
PDF version is featimp.pdf.
Functions are defined in featimp.py.