Skip to content

philippschmalen/SAI-SoMe-Analysis

Repository files navigation

Sustainable aviation initiative - Data Science

We push for more sustainable air travel through data-driven insights with data science at Sustainable aviation initiative (SAI). We are a diverse group with volunteers of all ages, interests and backgrounds. Aviation enthusiasts of all areas join forces to make air travel ready for a sustainable future.

This repository collects useful scripts, projects and analyses.

Join us

We would love to hear from you. Leave us a message at https://www.linkedin.com/company/sustainableaviationinitiative.

Social media analysis

Goal

Have one dashboard to summarizes SAI's reach across platforms. KPI is Total unique visitors per week.

Streamlit dashboard

The data analysis is live on https://share.streamlit.io/philippschmalen/sai-data-science .

Structure

./ Project root
├───data
│   ├───interim
│   ├───processed
│   └───raw
├───env             # contains conda environment.yml to create conda env
├───notebooks       
├───references
│   └───images      # used for readme
└───src
    ├───app         # contains streamlit app dashboard.py
    │   └───util    # utility functions
    └───data        # data processing script

Get started (developers)

The following assumes the python package manager Anaconda/Minconda. First,

  1. setup your conda virtual environment to be able to run the code

    1. open the anaconda cmd in your project folder
    2. navigate to ./env and run conda env create -f environment.yml
    3. Ensure that your python builder points to the python.exe from the environment we just created. For example, I use the project settings in Sublime to point to the python.exe of the sai virtual environment. See below for my exemplary project settings.
{
    "build_systems":
    [
        {
            "name": "Anaconda Python Builder",
            "selector": "source.python",
            "shell_cmd": "C:/Users/phili/Miniconda3/envs/sai/python.exe -u \"$file\""
        }
    ],

    "folders":
    [
        {
            "name": "SAI social media analysis",
            "path": "[MY PROJECT PATH]"
        }
    ]
}
  1. ensure that data files exists for Anchor and LinkedIn in ./data/raw/
  2. You find the data processing script in ./src/data/process_data.py. Change the data directories according to your OS and needs.

Develop Streamlit app locally

Refer to Streamlit's official get started guide.

Follow the above steps to install environment or run

pip install streamlit pandas plotly

Go to ./src/app, open an anaconda prompt conda activate sai and

streamlit run streamlit_app.py

A browser should open and show

Deploy with Streamlit Share

A checklist for deployment on Streamlit Sharing:

  1. Ensure a public repository
  2. A repository with submodules or subdirectories will not work
  3. Put requirements.txt into the repository root folder ./ with a minimum amount of packages, for example
    pandas==1.2.3
    pyyaml==5.4.1
    streamlit==0.79.0
    plotly==4.14.1

Common mistakes are incorrect filenames, such as requirement.txt or requirements.txt.txt instead of requirements.txt. 4. Any files called from within ./src/app/streamlit_app.py have to be called relative to the root folder of the repo, ./. Relative paths from like navigating one level up like ../ within streamlit_app.py will not work. Reference files from the repository root ./, for example: python # DOES'T WORK: call settings.yml with relative path one level up with open('../settings.yml') as file: [...] # WORK: call settings.yml from repo root dir with open('./src/settings.yml') as file:

How to add new data

To add new data points after a certain time, proceed as follows. Important: Store the data in the directory, where ./src/data/process_data.py expects it to be. This is ./data/raw by default.

AnchorFM

LinkedIn

Twitter

You find the analytics data as follows:

More > analytics > Tweets Select last 28 days and Export data by day to the project data folder ./data/raw.

Purpose of the jupyter ./notebooks

Explore ways to merge social media data across platforms. Make daily follower statistics from linkedin and anchorfm (manually) accessible. Focus on the total sum, such as Total page views or Total plays.

Implementation

  1. load files from directory, where the raw data lives (e.g. ../../data/raw)

  2. create a common date column with datetime format UTC

  3. drop date duplicates

  4. transform into long format which yields:

    date platform value
    0 2020-10-08 00:00:00+00:00 anchor 0.0
    1 2020-10-09 00:00:00+00:00 anchor 0.0
    2 2020-10-10 00:00:00+00:00 anchor 0.0
  5. export as csv, e.g. to ../../data/processed

About

Sustainable Aviation Initiative - Data Science Group

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published