Skip to content

The objective of the Project is to predict ‘Full Load Electrical Power Output’ of a Base load operated combined cycle power plant using Polynomial Multiple Regression. Concepts : 1) Clustering, 2) Polynomial Regression, 3) LASSO, 4) Cross-Validation, 5) Bootstrapping

Notifications You must be signed in to change notification settings

S-B-Iqbal/Predicting-Power-Output-of-a-combined-cycle-power-plant.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Predicting Full Load Electrical Output of a Combined Cycle Power Plant using Polynomial Linear Regression.

Objective :

The objective of the Project is to predict ‘Full Load Electrical Power Output’ of a Base load operated combined cycle power plant using Polynomial Multiple Regression. In the current project I have used Clustering to demonstrate a relationship between the various variables at play. Also, I have employed Cross-Validation to find the most efficient hyper-parameters for the model. Furthermore, I have demonstrated the use of ‘LASSO’ for dimension reduction. Finally, I have wrapped up by applying Bootstrapping in order to assess the accuracy of the model on the Test DataSet.

Motivation:

Predicting full load electrical power output of a base load power plant is important in order to maximize the profit from the available megawatt hours. The base load operation of a power plant is influenced by four main parameters, which are used as input parameters in the dataset, such as Ambient Temperature, Atmospheric Pressure, Relative Humidity and Exhaust Steam Pressure.

Data Collection:

URL : [https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant]

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance. For comparability with our baseline studies, and to allow 5x2 fold statistical tests be carried out, we provide the data shuffled five times. For each shuffling 2-fold CV is carried out and the resulting 10 measurements are used for statistical testing. We provide the data both in .ods and in .xlsx formats. Relevant Papers to cite: Pınar Tüfekci, Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods, International Journal of Electrical Power & Energy Systems, Volume 60, September 2014, Pages 126-140, ISSN 0142-0615, http://dx.doi.org/10.1016/j.ijepes.2014.02.027.

(http://www.sciencedirect.com/science/article/pii/S0142061514000908) Heysem Kaya, Pınar Tüfekci , Sadık Fikret Gürgen: Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE 2012, pp. 13-18 (Mar. 2012, Dubai)

Workflow :

1. **Exploratory Data Analysis**
2. Clustering :
    - K-Means Clustering.
        - Identification of optimal K.
        - Silhouette Analysis
3. Polynomial Multiple Regression:
    - a.	Data Modelling.
    - b.	Division of dataset into: Train, Test and Validation set.
    c.	Cross-validation to find the optimum degree ‘n’ for the polynomial regression.
    d.	LASSO for dimension reduction:
        -	Cross-Validation on Training and Validation set to find the best ‘alpha’ for LASSO reduction.
    e.	Model Evaluation on the Test data using the metrics R^2, and adjusted R^2.
4.  Bootstrapping : Confidence Interval of R^2 for Test Data.

Results:

1. EDA helps in giving a preliminary glimpse on how various factors are affecting the Power output.
2. Clustering showcases what factors are responsible for a higher Power output:
    - It suggests to increase Power - Humidity and Pressure should also be increased.
    - shows that for high levels of power to be generated, the Plant Temperature and Vacuum levels should be as low as possible.
3. 10-fold Cross-Validation helps in finding the most optimum degree for Polynomial Regression and level of 'alpha' in the LASSO model.
4. LASSO shows what parameters are important in the final model.
5. Bootstrapping reflects the confidence Interval of Accuracy for the model for unseen data.

Conclusion:

By tweaking the hyper-parameters using cross-validation and applying LASSO for getting the most important dimensions, The model is able to achieve an accuracy of 93% on Test Data. Thus, we can use this model for predicting with high accuracy what would be the Power output of a Combined Cyle Power Plant. This can substantially bring down the cost of production by controlling the input parameters of the plant and lead to increased efficiency.

About

The objective of the Project is to predict ‘Full Load Electrical Power Output’ of a Base load operated combined cycle power plant using Polynomial Multiple Regression. Concepts : 1) Clustering, 2) Polynomial Regression, 3) LASSO, 4) Cross-Validation, 5) Bootstrapping

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published