Prognostic and Predictive Maintenance (PdM) of NASA Turbofan Jet Engine

Predictive Maintenance techniques are used to determine the condition of an equipment to plan the maintenance/failure ahead of its time. This is very useful as the overall equipment and downtime costs can be reduced significantly.

The objective of this project is to implement various Predictive Maintenance methods and assess the performance of each. Each method can be classified broadly into four categories.

Classification: Predicting the failure of machine in upcoming n days
Regression: Predicting the remaining useful life of a machine (RUL)
Feature Selection: Explaining the causes of abnormal behavior
Anomaly Detection: Unsupervised learning
Survival Analysis: Stoistic modeling of probability hazards or survivals

Data

Data sets consists of multiple multivariate time series. Each time series is from a different engine – i.e., the data can be considered to be from a fleet of engines of the same type. You can find the data here.

The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure.

Imbalanced class distribution is a common issue in many classification tasks. It is not unusual that the training data contains highly imbalanced positive/negative examples which however reflects the true class distribution in a general population. In predictive maintenance applications, this issue exists: the imbalance of failure events to normal operation events. This issue is due to following two major reasons. First, the failure events usually rarely occurs compared to normal operation state for an in-service asset. Second, there are too few failure events. The business cannot afford to let the asset run-to-failure, as it is at the cost of equipment damage and equipment down time.

The SMOTE module is created based on algorithm "SMOTE: synthetic minority over-sampling technique" [5]. It is used to increase the size of the minority examples in a data set by synthesizing new examples with minority class. SMOTE module has two parameters: "SMOTE percentage" and "Number of nearest neighbors". The parameter "SMOTE percentage" should be in multiples of hundreds (100,200,300,400,…). This is fraction of new minority examples that gets added. For examples, we double our minority class by setting the value to 100, we triple the size of minority class by setting the value to 200, etc. The parameter "Number of nearest neighbors" is used to generate new examples from minority class. Each generated example is an average of the original example and its nearest neighbors from the same class.

EDA

Time based feature engineering: mean, max, min, standard deviation, etc.
Frequency based feature engineering: absolute, relative and cumulative frequency; skewness and peakedness.
Combined time and frequency based: bilinear time-frequency distributions; spectral-entropy

Models for Predictive Maintenance

Linear, Piecewise-Linear, Exponential Degradation, Weibull and ARIMA model for RUL Prediction
Time series forecasting plus anomaly detection
Pattern similarity based forecasting
Similarity-based model for RUL Prediction
LSTM model for RUL Prediction and binary and multiclass classification
RNN(GRU) model for binary and multiclass classification
1D CNN for binary and multiclass classification
1D CNN-RNN for binary and multiclass classification
1D CNN-GRU for binary and multiclass classification
1D CNN-LSTM for binary and multiclass classification
1D CNN-SVM for binary classification
Attention in GRU
Logistic regression classification
K nearest neighbors classification
Support vector machines (linear and RBF kernels) classifiers and regressors
Naive Bayes and Bayesian Network Models
Decision trees classification
Ensembles of DT (random forest, xgboost, light gradient boosting machines) classifiers and regressors
Autokeras failure prediction
Tsfresh, sktime, aeon, scikit-survival, py-survival
DTW and Time series clustering
Genetic Algorithm
Hidden Markov Models
Survival Analysis - gradient boosting survival, random survival forest, survival support vector machines, Cox probability harzard
Survival Analysis - RNN and Weibull propability distribution deep learning
Autoencoder - Anamaly Detection
Mulivariant Gaussian Unsupervised Learning
Principal Component Analysis for anamaly detection
Bayesian Structural Time Series (BSTS)

N-CMAPSS_DL

DL evaluation on N-CMAPSS

Turbo fan engine	CMAPSS [1]

Sample creator

Following the below instruction, you can create training/test sample arrays for machine learning model (especially for DL architectures that allow time-windowed data as input) from NASA's N-CMAPSS datafile.
Please download Turbofan Engine Degradation Simulation Data Set-2, so called N-CMAPSS dataset [2], from NASA's prognostic data repository
In the downloaded dataset, dataset DS01 has been used for the application of model-based diagnostics and dataset DS02 has been used for data-driven prognostics. Therefore, we need only dataset DS02.
Please locate "N-CMAPSS_DS02-006.h5"file to /N-CMAPSS folder.
Then, you can get npz files for each of 9 engines by running the python codes below.

python3 sample_creator_unit_auto.py -w 50 -s 1 --test 0 --sampling 10

After that, you should run

python3 sample_creator_unit_auto.py -w 50 -s 1 --test 1 --sampling 10

– w : window length
– s : stride of window
– test : select train or test, if it is zero, then the code extracts samples from the engines used for training. Otherwise, it creates samples from test engines
– sampling : subsampling the data before creating the output array so that we can set assume different sampling rate to mitigate memory issues.

Please note that we used N = 6 units (u = 2, 5, 10, 16, 18 & 20) for training and M = 3 units (u = 11, 14 & 15) for test, same as for the setting used in [3].

The size of the dataset is significantly large and it can cause memory issues by excessive memory use. Considering memory limitation that may occur when you load and create the samples, we set the data type as 'np.float32' to reduce the size of the data while the data type of the original data is 'np.float64'. Based on our experiments, this does not much affect to the performance when you use the data to train a DL network. If you want to change the type, please check 'data_preparation_unit.py' file in /utils folder.

In addition, we offer the data subsampling to handle 'out-of-memory' issues from the given dataset that use the sampling rate of 1Hz. When you set this subsampling input as 10, then it indicates you only take only 1 sample for every 10, the sampling rate is then 0.1Hz.

Finally, you can have 9 npz file in /N-CMAPSS/Samples_whole folder.

Each compressed file contains two arrays with different labels: 'sample' and 'label'. In the case of the test units, 'label' indicates the ground truth RUL of the test units for evaluation.

For instance, one of the created file, Unit2_win50_str1_smp10.npz, its filename indicates that the file consists of a collection of the sliced time series by time window size 50 from the trajectory of engine (unit) 2 with the sampling rate of 0.1Hz.

Load created samples

At first, you should load each of the npy files created in /Samples_whole folder. Then, the samples from the different engines should be aggregated.

def load_part_array_merge (npz_units):
    sample_array_lst = []
    label_array_lst = []
    for npz_unit in npz_units:
        loaded = np.load(npz_unit)
        sample_array_lst.append(loaded['sample'])
        label_array_lst.append(loaded['label'])
    sample_array = np.dstack(sample_array_lst)
    label_array = np.concatenate(label_array_lst)
    sample_array = sample_array.transpose(2, 0, 1)
    return sample_array, label_array

The shape of your sample array should be (# of samples from all the units, window size, # of variables)

References

[1] Frederick, Dean & DeCastro, Jonathan & Litt, Jonathan. (2007). User's Guide for the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS). NASA Technical Manuscript. 2007–215026.

[2] Chao, Manuel Arias, Chetan Kulkarni, Kai Goebel, and Olga Fink. "Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics." Data. 2021; 6(1):5. https://doi.org/10.3390/data6010005

[3] Chao, Manuel Arias, Chetan Kulkarni, Kai Goebel, and Olga Fink. "Fusing physics-based and deep learning models for prognostics." Reliability Engineering & System Safety 217 (2022): 107961.

[4] Mo, Hyunho, and Giovanni Iacca. "Multi-Objective Optimization of Extreme Learning Machine for Remaining Useful Life Prediction." EvoApplications, part of EvoStar 2022 (2022), to appear.

https://www.mathworks.com/help/predmaint/ug/rul-estimation-using-rul-estimator-models.html

[5] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16(1), 321-357.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
EDA		EDA
Failure Prediction Classification		Failure Prediction Classification
RUL Prediction Regression		RUL Prediction Regression
Survival Analysis		Survival Analysis
Unsupervised Anomaly Detection		Unsupervised Anomaly Detection
utils		utils
wtte		wtte
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cmapss.png		cmapss.png
inference_cnn_aggr.py		inference_cnn_aggr.py
sample_creator.py		sample_creator.py
sample_creator_unit.py		sample_creator_unit.py
sample_creator_unit_auto.py		sample_creator_unit_auto.py
sensors-21-00972-v2.pdf		sensors-21-00972-v2.pdf
setup.py		setup.py
turbo_engine.jpg		turbo_engine.jpg

License

xxl4tomxu98/NASA-Jet-Engine-Maintenance

Folders and files

Latest commit

History

Repository files navigation

Prognostic and Predictive Maintenance (PdM) of NASA Turbofan Jet Engine

Data

EDA

Models for Predictive Maintenance

N-CMAPSS_DL

Sample creator

Load created samples

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages