Cost-Focused Cloud Tracking

(Cloud Categorization & Tracking) By: Tadj Cazaubon (tc222gf)

TLDR;

Quick yet accurate Weather prediction is imperative for certain industries to now only survive, but simply exist. An important factor of these is the ability to track, categorize and predict movements of clouds within a given area. Current data is not meant for real-time application on a local area level. The proposal is the construction of a number of 'weather stations' which take atmospheric readings and images of the sky above them to accurately track cloud cover.

Longer Story

More location-accurate, real-time weather tracking and prediction is an endeavor with wide-reaching application. These include the ability to better prepare for local weather conditions, more refined weather condition description, such as duration and area of effect for storage units and warehouses, and the potential for solar panel owners to more accurately estimate power output using knowledge of cloud-cover.

These sorts of forecasts are usually made using satellite data. This would be from sources such as the MISR Level 2 Cloud product from NASA, showing cloud-motion vectors accurate to 17.6km [2], or the EUMETSAT MTG 3rd Gen. satellite array with a purported resolution of approx. 1km. [10] This data cannot be used for local weather forecasting however, as cloud-cover obscures the view of the land, as well as cloud-heights and environmental readings for overcast areas being unknowable.

Cloud-height, visibility, humidity are usually measured on the ground via devices such as Ceilometers. This however costs an average of approx. USD $ 30,000 [3] and covers approximately 8 km^2 [12]. Ground-based techniques which utilize a visual component usually do so via the use of calibrated camera arrays performing triangulation (B.Lyu, Y.Chen et al 2021)[13], sometimes going further to separate cloud fields from the sky background to describe cloud cover in terms of both horizontal size and velocity vectors(P.Crispel, G.Roberts 2018)[14]. Techniques which do not make use of a visual component utilize environmental readings such as dewpoint and relative humidity to then calculate the Lifted Condensation Level (LCL). This is “ the height at which an air parcel would saturate if lifted adiabatically ” [9] and can be used as a approximate stand-in for the base-height of a cloud in a given area. This approach may be able to act as a stand-in for areas unable to install a ceilometer, depending on the sensor accuracy [9]. The LCL however, though linearly related to the cloud base-height as shown later, may differ greatly to the actual cloud-base height value, dependent on many factors such as the time of day, time of year and micro-climate of the area.

Proposal

Both a miniaturization and hybridization of existing techniques of cloud feature description must take place. There now exist ceilometer weather stations with reasonable accuracy such as the MWS-M625 from Intellisense which measures at 19 x 14 x 14 cm fitting many high precision instruments, including a 360 deg high-resolution sky-imager [20]. Though inexpensive solutions have been shown such as Dev et al [23] in 2016 in creating whole-sky imagers which cost US$2,500 per unit, as well as Jain et al [5][24] in 2021 and 2022 respectively with costs close to US$300, we believe it possible to drop this further, whilst using less data than either.

The lack of hybridization in related works means that the density of information per image is more sparse than possible if a combination of environmental and visual methods are used. We propose to:

Create weather station(s) able to collect and send weather data within usable sensor accuracy.
- Set up a weather station at or near the Växjö Kronoberg Airport.
- Compare the accuracy of the readings against the data of the Växjö Airport.
Create/host a server which is able to accept multiple connections from these stations and process and store the incoming data.
Undistort the sky images. This is done by obtaining the intrinsic and extrinsic matrices of the stations prior to their deployment.
Identify the clouds in the scene via either statistical analysis or simple object detection.
Calculate the LCL (Lifted Condensation Level) via the environmental readings given, according to the method outlined in Romps. D (2017).
Compare the LCL approximated cloud heights against the data of the Växjö Airport.

1.0 ................ Cost Per Unit

2.0 ................ Sensor Accuracy

3.0 ................ Cloud-Sky Separation
    3.1 ............ Object Segmentation
    3.2 ............ Channel Distribution Similarity
        3.3.1 ...... 
        3.3.2 ...... Best Curve Determination 

4.0 ................ LCL (Lifted Condensate Level) Accuracy

1.0. Cost Per Unit

Component	Price (kr)	Link
Freenove ESP32-S3-WROOM CAM	249,00	Link
Adafruit Sensirion SHT31-D	229,00	Link
Adafruit BMP390	179,00	Link
OV5640 Camera Module	162,76	Link
Domain hosting for 1 year	210,03	Link
TOTAL COST	1029,79

2.0 Sensor Accuracy

We have yet to install a weather station at or near the Airport. This means that we do not have direct readings from to compare as of yet.

3.0 Cloud-Sky Separation

Images samples have been taken with a variety of cameras which includes the OV5640. These are compared with multiple shots from various DSLR cameras, taken as frames from timelapses.

Camera Model	Image Sample
OV2640
OV5640
DSLR

While colour space based operations are fairly easy on high quality images, the OV2460 is not high quality. Contrast is low, over/under-exposure are almost ensured and ISO changes are not only drastic but cause unwanted light filtering and other strange behaviour. The OV5640 seems more suited to this application due to it's 5MP shooting capability and higher dynamic range. Contrast, color accuracy, and exposure can be handled dynamically and are stepped up/down smoothly. This seems to also be bared out in our data.

3.1. Object Segmentation

To distinguish initially between Sky and Cloud regions, for each reference image, a segmented image is made where "sky" regions are coloured as black, and "cloud" regions are coloured as red. The boundary between cloud and sky is left bare as to not muddy results.

Reference Image	Blocked Image

This is used to create two binary masks.

Cloud Mask Bitmap	Sky Mask bitmap

Then subsequent masked images of the reference image, one for the clouds and one for the sky in the scene.

Cloud Masked Image	Sky Masked Image

These are split, iterated over and their colour channel values recorded as a frequency distribution.

The following shows the frequency graphs for the colour channels of the 60 images of the sky, separated into regions of sky and cloud.

Desc.	Histogram
RGB Distribution
HSv Distribution
YCbCr Distribution

Above we see that the Saturation channel, as well as Chroma Read and Blue would be good for discriminating between sky and cloud areas.

The "usefulness" of channels however, should intuitively depend on factors such as the frequency response of the particular camera model. This is borne out in our results as you will see below, as camera models differ in the the channels which are quantifiably "useful".

3.2. Channel Distribution Similarity

To quantify the similarity between the distributions for the cloud and sky portions within the dataset, we used the Jaccard index [25](Jaccard 1901). This is a widely applicable method of quantifying the similarity between to sets. It is normally defined as the intersection size divided by the union size. The Jaccard index was calculated for our DSLR group on the three (3) colour spaces, as well as the OV5640. The top channels under 0.5 similarity for each is taken. The results are as follows:

Camera	Jaccard Dictionary
DSLR
OV5640

From here we can see that both the scores and their rankings within the sorted dictionaries are different. Channels such as Green and YCbCr Brightness are not present in the results for the DSLR whilst appearing for the OV5640 group.

3.3. ROC Curve

3.3.1. BootStrapping

TODO

To now further refine the choice of colour channels, we construct ROC curves of the possible upper and lower bounds used for masking in a given channel, to quantify its ability to classify the pixels as either "cloud" or "sky".

ROC curves illustrate the performance of a binary classifier model at varying threshold values. As such, they are not used in testing two simultaneous variables, but just one. To remedy this, we fix each lower bound at a given value, then test all feasible upper bounds, and visualize this as an independent curve. This means that for each channel, we get multiple curves on the same plot, each showing the performance of simple masking given the fixed lower bound and a number of possible upper bounds. An example is this is below - The Saturation Channel in the HSV Colourspace for the DSLR groups:

Camera	Saturation ROC Curve	Chrome Blue ROC Curve
DSLR
OV5640

3.3.2. Best Curve Determination

The difficulty comes now in determining the best lower bound via these curves. Normally, an easy measure would be to determine this via the AUC(Area Under the Curve). However, as the lower bound increases, the number of corresponding upper bounds lessons, meaning this determination becomes more complicated as smaller lower-bound values may have an "advantage" in that they have more datapoints, making for a larger curve. As most valid masking ranges seem to cover most of the distribution at a time however, this seems to not affect the result greatly.

As said, we select the curve with the highest AUC. If a channel does not contain a curve with at least an AUC of 0.5, it is discarded. To then select the maximal point on the graph, this can be done in many ways. The criterion for selecting this point many times comes down to business priorities rather than mathematically. In our case however, we have opted to obtain this by selecting the point which maximizes the equation TPR−FPR (True Positive Rate minus False Positive Rate).

Filtering for the best Channels from the ov5640 and DSLR datasets leaves us with the following:

Camera	Optimal Channel Characteristics
DSLR
OV5640

4.0. LCL (Lifted Condensate Level) Accuracy

The Lifted Condensate Level can be be used in estimating the cloud-base height when only sufficient environmental readings are available. We estimate this according to the method outlined in Romps. D (2017), using the code made available from that publication within our application stack. To visualize the difference in cloud-base versus LCL measurement, we retroactively fetch METAR data for a Set of Airports, and visualize the fractional delta and simple 1-to-1 comparison in their results. We previously compared this fractional delta to the pressure, relative humidity, and temperature, to investigate their relationships, however, the relative humidity alone seemed somewhat directly correlated.

Through web-scraping Ogimet, we have made available METAR data from 01/01/2010 to 30/12/2023 for both the Vaxjo and Heartsfield-Jackson (Atlanta) airports. THese are viewable for Växjö here and ATL here

Firstly, the vaxjo airport was investigated due to proximity. Below we see the graph for the entirety of the period:

Airport	LCL Graph
ESMX
KATL

References

[1] The National Oceanic and Atmospheric Administration. 16 November 2012. p. 60.

[2] MISR 17.6 KM GRIDDED CLOUD MOTION VECTORS: OVERVIEW AND ASSESSMENT, Jet Propulsion Laboratory, 4800 Oak Grove, Pasadena, California, K. Mueller, M. Garay, C. Moroney, V. Jovanovic (2012).

[3] Ceilometer-Based Rain-Rate Estimation: A Case-Study Comparison With S-Band Radar and Disdrometer Retrievals in the Context of VORTEX-SE, F .Rocadenbosch, R. Barragán , S.J. Frasier ,J. Waldinger, D.D. Turner , R.L. Tanamachi, D.T. Dawson (2020) Available: here (Accessed May 19, 2023)

[4] “Misr: Spatial resolution,” NASA, Available: here (Accessed May 19, 2023).

[10] Meteosat Third Generation, EUMETSTAT, Jan 2021, Available: here

[11] The SEVIRI Instrument, J. Schmid, January 2000, Available: here

[12] CL31 Ceilometer for Cloud Height Detection, Vaisala, 2009 Available: here

[7] Wmo, “Cumulonimbus,” International Cloud Atlas. Available: here (accessed May 21, 2023)

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.vscode		.vscode
Devinci		Devinci
onboard		onboard
.gitignore		.gitignore
README.md		README.md
main.py		main.py
references.md		references.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

Devinci

Devinci

onboard

onboard

.gitignore

.gitignore

README.md

README.md

main.py

main.py

references.md

references.md

requirements.txt

requirements.txt

Repository files navigation

Cost-Focused Cloud Tracking

TLDR;

Longer Story

Proposal

Contents

1.0. Cost Per Unit

2.0 Sensor Accuracy

3.0 Cloud-Sky Separation

3.1. Object Segmentation

3.2. Channel Distribution Similarity

3.3. ROC Curve

3.3.1. BootStrapping

3.3.2. Best Curve Determination

4.0. LCL (Lifted Condensate Level) Accuracy

References

About

Releases

Packages

Languages

sudoDeVinci/DevinciCloud

Folders and files

Latest commit

History

Repository files navigation

Cost-Focused Cloud Tracking

TLDR;

Longer Story

Proposal

Contents

1.0. Cost Per Unit

2.0 Sensor Accuracy

3.0 Cloud-Sky Separation

3.1. Object Segmentation

3.2. Channel Distribution Similarity

3.3. ROC Curve

3.3.1. BootStrapping

3.3.2. Best Curve Determination

4.0. LCL (Lifted Condensate Level) Accuracy

References

About

Topics

Resources

Stars

Watchers

Forks

Languages