Skip to content

Latest commit

 

History

History
273 lines (200 loc) · 8.56 KB

web-server-setup.md

File metadata and controls

273 lines (200 loc) · 8.56 KB

Web Server Setup | Six Degrees of Wikipedia

Table of Contents

Initial Setup

  1. Create a new Google Compute Engine instance from the sdow-web-server instance template, which is configured with the following specs:

    1. Name: sdow-web-server-1
    2. Zone: us-central1-c
    3. Machine Type: f1-micro (1 vCPU, 0.6 GB RAM)
    4. Boot disk: 32 GB SSD, Debian GNU/Linux 10 (buster)
    5. Notes: Click "Set access for each API" and use default values for all APIs except set Storage to "Read Write".
  2. Install, initialize, and authenticate to the gcloud CLI.

  3. Set the default region and zone for the gcloud CLI:

    $ gcloud config set compute/region us-central1
    $ gcloud config set compute/zone us-central1-c
    
  4. SSH into the machine:

    $ gcloud compute ssh sdow-web-server-# --project=sdow-prod
  5. Install required operating system dependencies to run the Flask app:

    $ sudo apt-get -q update
    $ sudo apt-get -yq install git pigz sqlite3 python-pip
    $ sudo pip install --upgrade pip setuptools virtualenv
    # OR for Python 3
    #$ sudo apt-get -q update
    #$ sudo apt-get -yq install git pigz sqlite3 python3-pip
    #$ sudo pip3 install --upgrade pip setuptools virtualenv
  6. Clone this directory via HTTPS and navigate into the repo:

    $ git clone https://github.com/jwngr/sdow.git
    $ cd sdow/
  7. Create and activate a new virtualenv environment:

    $ virtualenv -p python2 env  # OR virtualenv -p python3 env
    $ source env/bin/activate
  8. Install the required Python libraries:

    $ pip install -r requirements.txt
  9. Copy the latest compressed SQLite file from the sdow-prod GCS bucket:

    $ gsutil -u sdow-prod cp gs://sdow-prod/dumps/<YYYYMMDD>/sdow.sqlite.gz sdow/
  10. Decompress the SQLite file:

    $ pigz -d sdow/sdow.sqlite.gz
  11. Create the searches.sqlite file:

    $ sqlite3 sdow/searches.sqlite ".read sql/createSearchesTable.sql"

    Note: Alternatively, copy a backed-up version of searches.sqlite:

    $ gsutil -u sdow-prod cp gs://sdow-prod/backups/<YYYYMMDD>/searches.sql.gz sdow/searches.sql.gz
    $ pigz -d sdow/searches.sql.gz
    $ sqlite3 sdow/searches.sqlite ".read sdow/searches.sql"
    $ rm sdow/searches.sql
  12. Install required operating system dependencies to generate an SSL certificate (this and the following instructions are based on these blog posts):

    $ sudo apt-get -q update
    $ sudo apt-get -yq install nginx certbot python-certbot-nginx
  13. Add this location block inside the server block in /etc/nginx/sites-available/default:

    location ~ /.well-known {
        allow all;
    }
    
  14. Start NGINX:

    $ sudo systemctl restart nginx
  15. Ensure the VM has been assigned the proper static IP address (sdow-web-server-static-ip) by editing it on the GCP console.

  16. Create an SSL certificate using Let's Encrypt's certbot:

    $ sudo certbot certonly -a webroot --webroot-path=/var/www/html -d api.sixdegreesofwikipedia.com --email wenger.jacob@gmail.com
  17. Ensure auto-renewal of the SSL certificate is configured properly:

    $ sudo certbot renew --dry-run
  18. Run crontab -e and add the following cron jobs to that file to auto-renew the SSL certificate, regularly restart the web server (to ensure it stays responsive), and backup the searches database weekly:

    # Renew the cert daily.
    0 4 * * * sudo /usr/bin/certbot renew --noninteractive --renew-hook "sudo /bin/systemctl reload nginx"
    
    # Restart the server every ten minutes.
    */10 * * * * /home/jwngr/sdow/env/bin/supervisorctl -c /home/jwngr/sdow/config/supervisord.conf restart gunicorn
    
    # Backup the searches database weekly.
    0 6 * * 0 /home/jwngr/sdow/scripts/backupSearchesDatabase.sh
    

    Note: Let's Encrypt debug logs can be found at /var/log/letsencrypt/letsencrypt.log.

    Note: Supervisor debug logs can be found at /tmp/supervisord.log.

  19. Replace the ExecStart line in /lib/systemd/system/certbot.service with the following to ensure NGINX restarts every time a new certificate is generated:

    ExecStart=/usr/bin/certbot -q renew --noninteractive --renew-hook "sudo /bin/systemctl reload nginx"
    
  20. Run the following commands to restart certbot and ensure the new timer is enabled:

    $ sudo systemctl daemon-reload
    $ sudo systemctl restart certbot.service
    $ sudo systemctl restart certbot.timer
    
  21. Install a mail service in order to read logs from cron jobs:

    $ sudo apt-get -yq install postfix
    # Choose "Local only" and use the default email address.

    Note: Cron job logs will be written to /var/mail/jwngr.

  22. Generate a strong Diffie-Hellman group to further increase security (note that this can take a couple minutes):

    $ sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048
  23. Copy over the NGINX configuration, making sure to back up the original configuration:

    $ sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup
    $ sudo cp ./config/nginx.conf /etc/nginx/nginx.conf
  24. Restart nginx:

    $ sudo systemctl restart nginx
  25. Install the Stackdriver monitoring agent:

    $ curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
    $ sudo bash add-monitoring-agent-repo.sh
    $ sudo apt-get update
    $ rm add-monitoring-agent-repo.sh
    $ sudo apt-get -yq install stackdriver-agent
    $ sudo service stackdriver-agent start

Recurring Setup

  1. Activate the virtualenv environment:

    $ cd sdow/
    $ source env/bin/activate
  2. Start the Flask web server via Supervisor which runs Gunicorn:

    $ cd config/
    $ supervisord
  3. Use supervisorctl to manage the running web server:

    $ supervisorctl status             # Get status of running processes
    $ supervisorctl stop gunicorn      # Stop web server
    $ supervisorctl start gunicorn     # Start web server
    $ supervisorctl restart gunicorn   # Restart web server

    Note: supervisord and supervisorctl must be run from the config/ directory or specify the configuration file via the -c argument or else they will return an obscure "http://localhost:9001 refused connection" error message.

    Note: Log output from supervisord is written to /tmp/supervisord.log and log output from gunicorn is written to /tmp/gunicorn-stdout---supervisor-<HASH>.log. Logs are also written to Stackdriver Logging.

Updating Data Source

To update the web server to a more recent sdow.sqlite file with minimal downtime, run the following commands after SSHing into the web server:

$ cd sdow/
$ source env/bin/activate
$ gsutil -u sdow-prod cp gs://sdow-prod/dumps/YYYYMMDD/sdow.sqlite.gz sdow/sdow_new.sqlite.gz
$ pigz -d sdow/sdow_new.sqlite.gz  # This takes ~5 minutes and causes search to be non-responsive.
$ mv sdow/sdow_new.sqlite sdow/sdow.sqlite
$ cd config/
$ supervisorctl restart gunicorn

Updating Server Code

To update the Python server code which powers the SDOW backend, run the following commands after SSHing into the web server:

$ cd sdow/
$ source env/bin/activate
$ git pull
$ pip install -r requirements.txt
$ cd config/
$ supervisorctl restart gunicorn