Skip to content

Latest commit

 

History

History
411 lines (330 loc) · 23.4 KB

README.md

File metadata and controls

411 lines (330 loc) · 23.4 KB

chance_of_showers

Matthew Epland, PhD

This project provides live water pressure measurements via a web dashboard running on a Raspberry Pi, logs the data, and creates time series forecasts of future water pressure.

Prophet Darts PyTorch

Plotly Matplotlib Polars Pandas Flask Bootstrap Socket.io Raspberry Pi KiCad

Poetry code style: black linting: pylint linting: flake8 checked with mypy imports: isort security: bandit linting: markdownlint linting: html5validator linting: StandardJS linting: yamllint code style: Prettier linting: checkmake linting: shellcheck linting: shfmt pre-commit

tests healthchecks.io

License: MIT

Introduction

Living in a 5th floor walk up in NYC can save you on rent and gym memberships, but runs the risk of leaving you high and dry when your water pressure gives out! The pressure delivered from the city's water mains is typically sufficient to reach the 6th floor, with higher buildings needing a booster pump and one of NYC's iconic rooftop water towers. My building lacks a pump and water tower, leaving my top floor apartment with just barely satisfactory pressure, as long as no other units are using water! As you can see in the data below, my daytime water pressure is all over the place. After being stranded soapy and cold halfway through a shower one too many times, I decided to use my data science and electronics skills to record the time series of my apartment's hot water pressure with the goal of forecasting future availability, and hence chance_of_showers was born!

chance_of_showers_dashboard_demo.mp4

Data Analysis Results

WIP

Time Series Plots

Below is a sample of the pressure data collected in November 2023. Clicking the links will open interactive plotly plots, please explore!

The data acquisition (DAQ) system saves the raw pressure data from the analog to digital converter (ADC) as an integer between 0 and 65472. Note that occasionally a water hammer will increase the pressure above its steady state value, marked by the orange 100% reference line, with a subsequent decay on the order of 10 minutes. When water is flowing at the pressure sensor, the data is shown with an open purple marker. Using water reduces the pressure slightly under normal conditions, and abruptly ends overpressure events.

To clean the data before fitting any models, I rescale the values to 0 and 1 between the steady state extrema. Any values that are outside the normalization range are capped.

Overall Pressure Distributions

Mean Pressure Value Density Mean Pressure Value Normalized vs Time of Week Mean Pressure Value Normalized vs Time of Day

Prophet Results

Prophet Predict Prophet Components
Prophet Components Weekly Prophet Components Daily

Hardware

Bill of Materials

Here is a list of the components I used in my build. With suitable alterations, the project could definitely be carried out with a wide array of other sensors, single board computers or microcontrollers, plumbing supplies, etc.

Electronics

Plumbing

Optional Components

Circuit Diagram

The circuit diagram for this implementation is provided as a KiCad schematic here.

Circuit Diagram

Photos

Bottom Left Top Right
Overhead Overhead Bottom OLED Overhead Middle Overhead Top GPIO
Left Bottom Left Top Right Top In Situ
In Situ OLED In Situe OLED (Flash) Plumbing Front Plumbing Back

Data Acquisition (DAQ)

The DAQ system recorded 95.4% of possible data points overall, and 99.870% since implementing the cron job heartbeat monitoring.

Launching the DAQ Script

The provided start_daq bash script will start the daq.py and fan_control.py scripts in new tmux windows. You will need to update the pkg_path variable in start_daq per your installation location.

source daq/start_daq

Opening the Web Dashboard

If daq: {display_web: true} is set in config.yaml, the local IP address and port of the dashboard will be logged on DAQ startup. Open this link in your browser to see the live dashboard, as shown in the introduction.

Setting up cron Jobs

Jobs to restart the DAQ on boot and every 30 minutes, as well as send heartbeat API calls - see below, are provided in the cron_jobs.txt file. Note that loading this file with crontab will overwrite any current cron jobs, so check your existing settings first with crontab -l!

crontab -l

crontab daq/cron_jobs.txt

You can verify the cron jobs are running as expected with:

grep CRON /var/log/syslog | grep $LOGNAME

Heartbeat Monitoring

You can use the provided heartbeat bash script to send heartbeat API calls for the DAQ script to healthchecks.io for monitoring and alerting. Configure your alert online at healthchecks.io, and then run the below commands to setup a secrets.json file with your alert's uuid. You will need to update the pkg_path variable in heartbeat per your installation location. The provided cron_jobs.txt will setup a cron job to send the heartbeat on the 15 and 45 minute of each hour.

sudo apt install -y jq
echo -e "{\n\t\"chance_of_showers_heartbeat_uuid\": \"YOUR_UUID_HERE\"\n}" > secrets.json
source daq/heartbeat

The heartbeat script has also been setup to backup the daq/raw_data and daq/saved_data directories to backup_path="/media/usb_drive/daq_backup". Please configure backup_path to fit your path, or comment out the rsync lines to turn them off. Regular backups of the data to a separate drive are helpful as Raspberry Pis have been known to corrupt their SD cards due to power loss or excessive writes.

Combining Raw DAQ Files

Raw CSV files can be combined into convenient Parquet files prior to analysis with the etl.py script. If the script crashes, you may need to manually repair any lines in the CSV files corrupted due to power losses. Polars should generate error messages indicating the corrupt datetime to help you locate the problematic file and line.

python daq/etl.py

Bayesian Optimization

To optimize the many hyperparameters present in this project, both of the individual forecasting models themselves as well as how the data is prepared, Bayesian optimization was used to efficiently sample the parameter space. The functions needed to run Bayesian optimization are located in bayesian_opt.py.

Unfortunately, actually running the optimization over GPU accelerated models is not as simple as calling the run_bayesian_opt() function. I have been unable to successfully detach the training of one GPU accelerated model from the next when training multiple models in a loop. The second training session will still have access to the tensors of the first, leading to out of GPU memory errors, even when using commands like gc.collect() and torch.cuda.empty_cache(). The torch models created by darts are very convenient, but do not provide as much configurability as building your own torch model from scratch, leading me unable to fix this issue in a clean way.

To work around the GPU memory issues, a shell script, start_bayesian_opt, is used to repeatedly call run_bayesian_opt() via the bayesian_opt_runner.py script. In this way each model is trained in its own Python session, totally clearing memory between training iterations. A signed pickle file is used to quickly load the necessary data and settings on each iteration. Instructions for running the whole Bayesian optimization workflow are provided below.

Running Bayesian Optimization

  1. Create the input parent_wrapper.pickle file for bayesian_opt_runner.py via the exploratory_ana.py notebook.
  2. Configure the run in start_bayesian_opt and bayesian_opt_runner.py.
  3. Run the shell script, logging outputs to disk via:
./ana/start_bayesian_opt 2>&1 | tee ana/models/bayesian_optimization/bayesian_opt.log

Dev Notes

Data Analysis Setup - Installing CUDA and PyTorch

  1. Find the supported CUDA version (11.8.0) for the current release of PyTorch (2.0.1) here.
  2. Install CUDA following the steps for the proper version and target platform here.
  3. Update the poetry pytorch-gpu-src source to point to the correct PyTorch version in pyproject.toml.
  4. Install the poetry ana group with make setupANA.
    • This will install pytorch, along with the other necessary packages.
  5. Check that PyTorch and CUDA are correctly configured with the following python commands:
import torch

if torch.cuda.is_available():
    print("CUDA is available")
    print(f"Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
else:
    print("CUDA IS NOT AVAILABLE!")

DAQ Setup - Installing Python 3.11 on Raspbian

If python 3.11 is not available in your release of Raspbian, you can compile it from source following the instructions here, but will also need to install the sqlite extensions:

cd /usr/src/
sudo wget https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz
sudo tar -xzvf Python-3.11.4.tgz
cd Python-3.11.4/
sudo apt update && sudo apt full-upgrade -y
sudo apt install -y build-essential libbz2-dev libc6-dev libexpat1-dev libffi-dev libgdbm-dev liblzma-dev libncurses5-dev libnss3-dev libsqlite3-dev libssl-dev lzma pkg-config zlib1g-dev
sudo apt autoremove -y
sudo apt update && sudo apt full-upgrade -y
./configure --enable-optimizations --enable-loadable-sqlite-extensions
sudo make altinstall

# Should be Python 3.11.4 with your compile info
/usr/local/bin/python3.11 -VV

# Link binary
sudo rm /usr/bin/python
sudo rm /usr/bin/python3
sudo ln -s /usr/local/bin/python3.11 /usr/bin/python
sudo ln -s /usr/local/bin/python3.11 /usr/bin/python3

# Should match /usr/local/bin/python3.11 -VV
python -VV

Additional DAQ Dependencies

To finish setting up the DAQ system you must also:

  • Install tmux, which is not included in Raspbian by default. tmux is used to control multiple terminal sessions in start_daq.
  • Install pigpio, which is not included in Raspbian Lite, i.e. headless, installations. pigpio is necessary to interface with the GPIO ports and must also be enabled via a daemon
  • Enable SPI, I2C, and Remote GPIO via raspi-config.
  • Prevent the WiFi from powering off.
# Install tmux and pigpio
sudo apt-get install -y tmux pigpio

# Enable SPI, I2C, and Remote GPIO
sudo raspi-config

# Setup pigpio daemon
sudo systemctl enable pigpiod

# Prevent the WiFi from powering off
# Above the line that says exit 0 insert `/sbin/iw wlan0 set power_save off` and save the file
sudo vi /etc/rc.local

Installing Dependencies with Poetry

Install poetry following the instructions here.

curl -sSL https://install.python-poetry.org | python3 -

Then install the python packages needed for this installation. Groups include:

  • daq for packages needed to run the DAQ script on a Raspberry Pi, optional
  • web for packages needed to run the live dashboard from the DAQ script, optional
  • ana for analysis tools, optional
  • dev for continuous integration (CI) and linting tools
poetry install --with daq,web

or

poetry install --with ana

Setting up pre-commit

It is recommended to use the pre-commit tool to automatically check your commits locally as they are created. You should just need to install the git hook scripts, see below, after installing the dev dependencies. This will run the checks in .pre-commit-config.yaml when you create a new commit.

pre-commit install

Installing Non-Python Based Linters

Markdown is linted using markdownlint-cli, JavaScript by standard, and HTML, SCSS, CSS, and TOML by prettier. You can install these JavaScript-based linters globally with:

sudo npm install --global markdownlint-cli standard prettier
sudo npm install --global --save-dev --save-exact prettier-plugin-toml

Shell files are linted using shellcheck and shfmt. Follow the linked installation instructions for your system. On Fedora they are:

sudo dnf install ShellCheck shfmt

Using the Makefile

A Makefile is provided for convenience, with commands to setup the DAQ and analysis environments, make setupDAQ and make setupANA, as well run CI and linting tools, e.g. make black, make pylint, make pre-commit.