Evaluation of Preprocessing Methods of Sentinel-2 Data and their Impact on Traditional Empirical and Modern Machine Learning Based Satellite Derived Bathymetry Methods

The goal of this project is to look at the influence of different preprocessing methods for Sentinel-2 data products and their influence on traditional algorithms like the Stumpf Log-Ratio Method (Stumpf et al., 2003) in contrast to modern approaches like LightGBM (Ke et al., 2017).

⚠ WIP Warning ⚠

As of now the conference paper still needs to be published and linked to this repository.

Bit Rot Disclaimer

While I tried to make sure that everything in this repository can be inspected and executed by interested readers, I am aware that after a while newer Python and package versions will break the project's code. I provide an environment.yml file that documents the exact versions of all dependencies I used on my system. I also documented each notebook in a way that should make the intention of each step very clear, so that even if a reader wants to migrate the complete analysis or just parts to another language or environment, this can be done even without using the provided code.

Analysis Areas of Interest

The analysis looks at three different areas:

Section of shallow ocean water near the north-west corner of the Bahamas BBox: (25.23467352,-78.43272685,25.31877266,-78.23940804)
Section of shallow ocean water the west coast of Puerto Rico BBox: (18.14442526,-67.24112119,18.17335221,-67.18944271)
Mille Lacs Lake in Minnesota, USA BBox: (46.099296265601545,-93.83878319721899,46.377612102131366,-93.44756526336063)

Analysis Data

The data used for this environment consists of:

Shapefiles for certain AOIs created in QGIS
Bathymetry maps from various sources
Sentinel-2 L1C scenes and derived L2A and Acolite products

The data needed to reproduce this analysis will be shared with the accompanying paper on Zenodo.

The Bathymetry Sources are:

Mille Lacs Lake: Lakes Data for Minnesota Bathybase Entry
Puerto Rico: Grid Export NOAA NCEI Data Viewer
Bahamas: handed down from previous project. The source reference is unfortunately lost.

Computing Environment

This project was mainly executed on a Laptop PC (Lenovo ThinkPad E14 Gen 2, Intel Core(TM) i7-1165G7, 32 GB RAM, Windows 10 21 H2). While especially the modelling notebooks can make good use of additional CPU resources a machine with lower specs should be still sufficient to repeat all processing steps. Windows users should be able to directly recreate the conda from the environment.yml file in this repository. Linux and macOS users will need to adapt the environment as some transitive dependencies are currently locked at Windows specific versions.

Interpretation of Notebook Order

In the notebooks directory of this repository you will find numerated Jupyter Notebooks which can be subdivided into the following process steps:

Bathymetry Map Preprocessing ( 00 - Puerto Rico, 01 - Bahamas, 02 - Mille Lacs Lake)
Sentinel-2 Data Preprocessing and Dataset Merge ( 03 - Puerto Rico, 04 - Bahamas, 05 - Mille Lacs Lake)
Stumpf Log-Regression Fitting and Evaluation ( 06 - Puerto Rico, 07 - Bahamas, 08 - Mille Lacs Lake)
LightGBM Fitting and Evaluation ( 09 - Puerto Rico, 10 - Bahamas, 11 - Mille Lacs Lake)

Each notebook includes a detailed description of the current context and each taken step. I tried to document each notebook in a way that they can also be read in isolation. In some instances (especially when comparing results) I add references to other notebooks. If you wish to read a more condensed writeup of the project please feel free to follow the link to my conference paper.

Python Sources

While working on this project I produced a rather generic eolearn_extras module which contains some eo-learn tasks which could be useful to others and a less generic collection of helper code in the notebooks/sdb_utils directory. All the code is available freely under the MIT license. If you find any bugs or need further assistance please don't hesitate to open an issue.

Approach

The general analysis approach can be seen in Figure 1. As both the traditional as well as the modern model are supervised learning algorithms we need to provide ground truth values for training. Those values can be extracted from bathymetry maps which represent the depth profile (or underwater topography) of areas of inland or ocean water. Two possible repositories are Bathybase and the National Oceanic and Atmospheric Administration's (NOAA) National Centers for Environmental Information (NCEI) bathymetry portal.

For a given area of interest (AOI) which either includes the extent of the whole bathymetry map or a particular subsection we search for Sentinel-2 scenes which contain the AOI at a time with no cloud obstruction and - in the case of regions which experience low temperatures - no ice formation. Once a fitting scene is found we download the complete Standard Archive Format for Europe (SAFE) archive and store it for further preprocessing. It is essential not to use partial downloads (e.g. with the sentinelsat Python package) because further preprocessing methods assume that the SAFE archives are complete.

In this project two preprocessing methods for atmospheric correction are evaluated against the top of atmosphere (TOA) L1C product. One is the L2A product generated by using the Sen2Cor processor (Main-Knorn et al., 2017) while the other is the resulting data product produced by applying the Acolite (Vanhellmont and Ruddick, 2016) processor. Table 1 shows the exact version of the used operating system (OS) as well as the versions of the processors.

Software	Version
Windows OS	21H2 Build 19044.1706
Sen2Cor	2.10.01-win64
Acolite	Generic Git - Hash dafc2d4bced4864f0bc111b9e0d3348ff16a5336

Table 1: Used software for executing preprocessor

All further processing of the acquired raster images to create analysis ready data (ARD) is done using the eo-learn framework. You can fnd a detailed description of all steps for data preprocessing, modelling and model evaluation in the notebooks folder of this repository.

Fig 1: General analysis approach

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
configs		configs
diagrams		diagrams
eolearn_extras		eolearn_extras
figures/plotly_exports		figures/plotly_exports
notebooks		notebooks
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
License.md		License.md
Readme.md		Readme.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

License

WalternativE/evaluation-preprocessing-methods-sentinel2-ml-based-bathymetry-estimation

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Preprocessing Methods of Sentinel-2 Data and their Impact on Traditional Empirical and Modern Machine Learning Based Satellite Derived Bathymetry Methods

⚠ WIP Warning ⚠

Bit Rot Disclaimer

Analysis Areas of Interest

Analysis Data

Computing Environment

Interpretation of Notebook Order

Python Sources

Approach

About

Resources

License

Stars

Watchers

Forks

Languages