Skip to content

LeliaPlusPlus/WashingtonClimateChangeIndexCaseStudy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

WashingtonClimateChangeIndexCaseStudy

This repository contains a case study with an exploratory analysis of the climate change index for Washington state from 2000-2010.

Statement of Research Problem/Rationale

Research Questions

  1. What is the association, if any, between year and aggregate greenhouse gas emission?
  2. If there is an association between year and aggregate greenhouse gas emission, what are the projected greenhouse gas emissions for the next five years?
  3. What is the association, if any, between year and greenhouse gas emission per person?
  4. What is the association, if any, between aggregate greenhouse gas emission and per $ spent?

Rationale/Justification

This topic is worthy of study because climate change is a serious ongoing issue, causing and projected to cause many harmful environmental effects. In particular, continual rise in global temperatures, changes in precipitation patterns, more droughts and heat waves, increase in strength and intensity of hurricanes, rise in sea level, and melting of Arctic ice sheets (National Aeronautics and Space Administration).

Using a machine learning model, in this case linear least squares regression, is justified because it allows the author to answer the research questions related to association with the given dataset in addition to providing the ability to predict aggregate greenhouse emissions for the next five years. In addition, machine learning on environmental data has already shown successful in technology industries; in particular, Google has decreased its data centers energy consumption by 40% using Deep Learning’s “Earth Friendly” Artificial Intelligence (Medium).

Project Objectives

The objective of this case study is to perform an exploratory data analysis to understand whether or not greenhouse emission in relation to product life cycle in Washington state is increasing or decreasing.

Background

After Washington recognized the urgent threat anthropogenic climate change posed to Washington’s economic well-being, public health, natural resources, and environment, the Washington Legislature established limits on the state’s greenhouse gas emissions in state law in 2008. Unfortunately, Washington has already experienced long-term impacts of climate change, including sea level rise, an increase in ocean acidification, long-term warming trends, and decline of snowpack and glaciers. The current limits for Washington are to reduce overall emissions of greenhouse gases in the State to 1990 levels by 2020, reduce overall emissions of greenhouse gases in the Washington to 25 percent below 1990 levels by 2035, to reach global climate stabilization levels by reducing overall emissions to 50 percent below 1990 levels, or 70 percent below the State's expected emissions that year by 2050. Given the current state of knowledge that states sharp reductions in greenhouse gas emissions are required to stabilize carbon in the atmosphere, the Washington Department of Ecology has recommended to change limits to reduce overall emissions of greenhouse gases in the State to 1990 levels by 2020, reduce overall greenhouse gas emissions in the state to 40 percent below 1990 levels by 2035, and reduce overall greenhouse gas emissions in the state to 80 percent below 1990 levels by 2050. (Washington Department of Ecology)

Notably, Washington’s emissions profile is different of that of the overall United States’ emission profile. The emission profile of the U.S. is dominated by the power sector, which comprised 30 percent of total U.S. greenhouse gas emissions in 2014, with the second largest source of greenhouse gas emissions being transportation. On the other hand, the emission of Washington is dominated by transportation emissions, accounting for 43 percent of total statewide greenhouse gas emissions in 2013, given that Washington has a relatively clean power sector with substantial hydropower assets. (Washington Department of Ecology)

Literature Review

Notably, much of the literature focuses on the accuracy of machine learning methods to predict climate change unlike other publications which may use statistical methods for predictions and assume that they are accurate for the given dataset. This literature review focuses specifically on publications which discuss using machine learning methods to predict climate change, both globally and locally.

Detecting extreme events in large data sets is a major challenge in climate change research (Lui et al., 2016). Often, different machine-learning methods will produce vastly different results on the same dataset (Lui et al., 2016). Developing their own deep Convolutional Neural Network (CNN) classification system, Liu et al. was able to demonstrate the usefulness of Deep Learning technique for tackling climate pattern detection problems and achieve 89-99% accuracy in detecting extreme events (2016).

Wiley and colleagues explored how biological communities are changing due to climate change (2003). The authors used the Genetic Algorithm for Rule-set Prediction (GARP), which includes several inferential tools in an artificial-intelligence-based approach to modeling, on data for geographic distribution of species (Wiley et al., 2003). The aim was to provide a severe test of GARP’s predictive ability (Wiley et al., 2003). Overall, the results indicated high predictivity for all species with variations between between 0.991 and 0.613 (Wiley et al., 2003).

Pearson and colleagues estimated the influence of future climate-change predictions on the distribution of Arctic vegetation types using machine-learning classification and ecological niche models (Pearson et al., 2013). The researchers were able to classify Arctic vegetation into four classes of graminoids, four classes of shrubs and two classes of tree cover (Pearson et al., 2013). 48–69% of the study area was predicted to shift to a different class under scenarios of restricted tree dispersal and climate change for the 2050s, and 57–84% of the area was predicted to shift to a different class under an equilibrium scenario with unrestricted dispersal (Pearson et al., 2013).

Benito Garzon et al. aimed to predict how climate changes affect species distribution in the Iberian Peninsula (2008). The researchers modelled current and future tree distributions as a function of climate using the random forest algorithm (Benito Garzon et al, 2008). The results showed a reduction in distribution among all studied species (Benito Garzon et al, 2008). This indicates that climate change will have the impact of potentially decimating species in the Iberian Peninsula.

Beer and colleagues estimated terrestrial gross primary production (GPP) using both diagnostic and process-oriented models, using a model tree ensemble (MTE) and an artificial neural network (ANN) (2010). The results showed how much each biome contributed to global GPP with tropical forests assimilating 34% of the global terrestrial GPP and savannahs accounting for 26% of the global GPP (Beer et al., 2010).

Rogan and colleagues compared the performance of three machine learning algorithms, including two classification tree algorithms and an artificial neural network “in the context of mapping land-cover modifications in northern and southern California study sites between 1990/91 and 1996” (2008). Comparisons were based on the following criteria: overall accuracy, sensitivity to data set size and variation, and noise (Rogan et al., 2008). The artificial neural networks produced the most overall accuracy for two study areas at approximately 84% and was more resistant to deficiencies in training data (Rogan et al., 2008). Therefore, artificial neural networks were the most accurate and robust for automated, large area change monitoring (Rogan et al., 2008).

Peterson and colleagues developed predictions of the effects of global climate change on distributions of 1,870 species under one liberal and one conservative scenario of global climate change (2002). The researchers used the Genetic Algorithm for Rule-set Prediction (GARP) to delineate ecological niches and predict geographic distributions (Peterson et al., 2002). Under both scenarios, the coastal portion of the species’ distribution remained similar, the interior portion became less habitable, and interestingly a narrow band in the foothills of the coastal mountain ranges became more habitable (Peterson et al., 2002).

In terms of using machine learning models to predict climate change, the literature almost exclusively uses spatial data based on geography. In addition, much of the focus is also on understanding how well machine learning models predict the data rather than on specifically using machine learning models to predict the data. Very few studies have used general linear models to predict climate change (Austin et al., 1996). However, this research seeks to use a linear machine learning model to directly predict climate change in a region in the Pacific Northwestern United States.

Methods

Tools

The tools software tools used for this data analysis were Python, Jupyter Notebook, and Python packages (pandas/matplotlib/scipy/numpy in particular). Python pandas library allows for data manipulation and analysis. Python matplotlib library allows for data visualization. SciPy contains libraries for statistical functions. Python NumPy library adds support for multidimensional arrays and functions to operate on these arrays.

Procedures

Initially, descriptive statistics, specifically count/mean/median/standard deviation/interquartile range/minimum/maximum, for the aggregate greenhouse emission, greenhouse emission per person, and per $ spent columns were used to gain some understanding of the data. Then, bar graphs were used to visualize aggregate greenhouse emission by year and greenhouse emission per person by year. This gave initial insight into whether or not the aggregate emission is increasing, decreasing, or staying the roughly the same per year. Next, a scatterplot is used to gain some initial understanding of whether or not linear least squares regression is appropriate in this case for the relations between year and aggregate greenhouse emission, greenhouse emission per person and year, and aggregate greenhouse emission and per $ spent, respectively. In other words, it was used to visually assess if there was a potential linear relationship between the two variables in each case. Following, the least squares regression line was used to calculate aggregate greenhouse gas emissions for the next five years. The linear regression line was then put onto a graph for each of the given years in the data set. Following, the linear regression calculations, r squared and standard error are used in order to understand how well a linear model fits the given data set.

Results and Analysis

According to the descriptive statistics for the column for aggregate greenhouse emissions, the count is 11, the mean is 111.12, the standard deviation is 5.67, the minimum is 100, the median is 112.38, the first interquartile range is 108.835, the third interquartile range is 114.585, the maximum is 119.6. Given that the mean and median are relatively close, this indicates that the mean can be used as a good measure for central tendency for this column in addition to the median since there are not many outliers skewing the data. The standard deviation for this column are on the surface seemingly not large, but when taking into consideration that the units of energy for greenhouse emissions are massive this indicates that the standard deviation is relatively large. Therefore, the dispersal from the mean for the data in this column is relatively large. Notably, the maximum and minimum are relatively far apart.

According to the descriptive statistics for the column for greenhouse emissions per person, the count is 11, the mean is 103.647, the standard deviation is 2.309, the minimum is 99.45, the median is 104.41, the first interquartile range is 102.66, the third interquartile range is 105.28, the maximum is 106.69. In this case, the mean and median are relatively close to each other, making the mean a good measure of central tendency in addition to the median. The standard deviation in this case is relatively small. The maximum and minimum are also relatively close to each other. Overall, the data points are extremely close to one another.

Interestingly, the aggregate greenhouse emission decreased in 2010 from previous years. Of course, this impacted the calculation, but it was not thrown out as an outlier, so that it could be used to help increase the accuracy in prediction for greenhouse emissions in the next five years. If the 2010 data for aggregate greenhouse emission and greenhouse emission per person was thrown out, the data would not be able to indicate whatsoever that there was a decrease in both respectively. It would be interesting to look at the data for 2011-2017 to determine if the aggregate greenhouse emission and greenhouse emission per person continued to decrease.

The bar graph for aggregate greenhouse emission by year (Figure 2) indicates that there the aggregate greenhouse emission is increasing every year, but decreases in 2010. The bar graph for greenhouse emission per person by year (Figure 3) shows no clear trend.

According to the linear regression for x = year and y = aggregate greenhouse emission (Figure 4), the aggregate greenhouse emission is increasing each year as indicated by the positive slope which is approximately 1.53. The goodness of the fit of the line is strong as indicated by the r-value of 0.896, which is close to 1 and greater than 0.500, so we can deduce that there is a strong association between year and aggregate greenhouse emissions as well. In addition, the linear model is supported by the standard error of 0.25, which is relatively close to 1, meaning the observations is pretty close to the fitted line. As stated, the aggregate greenhouse emission decreased in 2010, so it would be useful to see data from 2011 and beyond to know if the decrease was a trend or only applicable for 2010. Fortunately, to counterbalance this, predictions for 2019-2023 were made (Table 4). The projected greenhouse gas emissions for the next five years (Table 4) are 132.591 (2019), 134.125 (2020), 135.658 (2021), 137.192 (2022), 138.725 (2023). The predictions support the positive correlation as the aggregate greenhouse emissions increase each year for the predictions.

According to the linear regression for x = year and y = greenhouse emission per person (Figure 5), there is no association between year and greenhouse emission per person. The r-value was 0.023, which is close to 0 and less than 0.500, so we can deduce there is not a linear relationship, and thereby correlation, between year and greenhouse emission per person. The standard error was not considered since there is no linear relationship between x and y in this case.

According to the linear regression for x = per $ spent and y = aggregate greenhouse emission (Figure 6), there is a negative association between per $ spent and aggregate greenhouse emission as indicated by the negative slope -0.972. The goodness of the fit of the line is strong as indicated by the r-value of -0.907, which indicates a very close correlation between x and y, so we can deduce that there is a strong association between per $ spent and aggregate greenhouse emission. In addition, the linear model and strong correlation are supported by the standard error of 0.15, which is relatively close to 1, meaning the observations is pretty close to the fitted line.

Conclusions/Discussion

RQ1: What is the association, if any, between year and aggregate greenhouse emission?

The high r-value and low standard error for the regression line for year and aggregate greenhouse emission shows that there is a strong association between year and aggregate greenhouse emission. Given the slope of 1.53, it can be concluded that there is a positive correlation between year and aggregate greenhouse emission as the aggregate greenhouse emission is increasing each year.

RQ2: If there is an association between year and aggregate greenhouse gas emission, what are the projected greenhouse gas emissions for the next five years?

The projected greenhouse gas emissions for the next five years (Table 4) are 132.591 (2019), 134.125 (2020), 135.658 (2021), 137.192 (2022), 138.725 (2023). The predictions show the potential increase in aggregate greenhouse emissions if the state of Washington does not address the increasing greenhouse emissions. While the increases are not large, it adds up over time because the for projected five year period the aggregate greenhouse emission is predicted to increase by 6.134.

RQ3: What is the association, if any, between year and greenhouse gas emission per person?

The extremely low r-value for the regression line for year and greenhouse gas emission per person, and thereby a lack of linear relationship, indicates that there is no correlation between year and greenhouse gas emission per person. However, while it should be noted that there is not a linear relationship, there may be another functional relationship that cannot be deduced with linear regression which would require further analysis of the dataset. Moreover, since there is no association between year and greenhouse emission per person, this shows that greenhouse emission per person is not indicative of aggregate greenhouse emission since aggregate greenhouse emission increased each year, except 2010, regardless of whether the greenhouse emission per person increased or decreased.

RQ4: What is the association, if any, between aggregate greenhouse gas emission and per $ spent?

The high r-value and low standard error for the regression line for year and aggregate greenhouse emission shows that there is a strong association between aggregate greenhouse gas emission and per $ spent. Given the slope of -0.972, it can be concluded that there is a negative correlation between aggregate greenhouse gas emission and per $ spent as the aggregate greenhouse emission is decreasing per $ spent.

As stated, there was an increase in aggregate greenhouse gas emissions from 2000-2009. Therefore, Washington did not meet the goal of lowering its aggregate greenhouse emissions in order to work toward the limits set by the state legislature in 2008. However, Washington moved closer to that goal in 2010 as evidenced by the decrease in aggregate greenhouse gas emission from the prior year.

Notably, the findings that the aggregate greenhouse emission each year is increasing is an extremely important finding. If the greenhouse gas emissions are not addressed, it will impact Washington’s economic well-being, public health, natural resources, and environment. In particular, there will be continued sea level rise, an increase in ocean acidification, long-term warming trends, and decline of snowpack and glaciers in Washington. This will result in financial and other costs in relation to shoreline/coast management, ocean chemistry in the Pacific Northwest, toxic cleanup, water supply, water quality, and human health.

References

Beer, C., Reichstein, M., Tomelleri, E., Ciais, P., Jung, M., Carvalhais, N., an Rödenbeck, C., Altaf Arain, M., Baldocchi, D., Bonan, G. B., & Bondeau, A. (2010). Terrestrial gross carbon dioxide uptake: global distribution and covariation with climate. Science, 1184984.

Benito Garzón, M., Sánchez de Dios, R., & Sainz Ollero, H. (2008). Effects of climate change on the distribution of Iberian tree species. Applied Vegetation Science, 11(2), 169-178.

Department of Ecology of State of Washington. (2016). Washington greenhouse gas emissions reduction limits. Retrieved from https://fortress.wa.gov/ecy/publications/documents/1601010.pdf.

Liu, Y., Racah, E., Correa, J., Khosrowshahi, A., Lavers, D., Kunkel, K., Wehner, M., & Collins, W. (2016). Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv preprint arXiv:1605.01156.

National Aeronautics and Space Administration. Global climate change: Vital signs of the planet. Retrieved from https://climate.nasa.gov/effects/.

Neuromation. AI and deep learning are the keys to unlocking the future of environmental sustainability. Retrieved from https://medium.com/neuromation-io-blog/ai-and-deep-learning-are-the-keys-to-unlocking-the-future-of-environmental-sustainability-345f93a57ec5.

Pearson, R. G., Phillips, S. J., Loranty, M. M., Beck, P. S., Damoulas, T., Knight, S. J., & Goetz, S. J. (2013). Shifts in Arctic vegetation and associated feedbacks under climate change. Nature climate change, 3(7), 673.

Peterson, A. T., Ortega-Huerta, M. A., Bartley, J., Sánchez-Cordero, V., Soberón, J., Buddemeier, R. H., & Stockwell, D. R. (2002). Future projections for Mexican faunas under global climate change scenarios. Nature, 416(6881), 626.

State of Washington. (2018). Climate change index [Data file]. Retrieved from https://catalog.data.gov/dataset/climate-change-index.

Rogan, J., Franklin, J., Stow, D., Miller, J., Woodcock, C., & Roberts, D. (2008). Mapping land-cover modifications over large areas: A comparison of machine learning algorithms. Remote Sensing of Environment, 112(5), 2272-2283.

Wiley, E. O., McNyset, K. M., Peterson, A. T., Robins, C. R., & Stewart, A. M. (2003). Niche modeling perspective on geographic range predictions in the marine environment using a machine-learning algorithm.

About

This repository contains a case study with an exploratory analysis of the climate change index for Washington state from 2000-2010.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published