We push for more sustainable air travel through data-driven insights with data science at Sustainable aviation initiative (SAI). We are a diverse group with volunteers of all ages, interests and backgrounds. Aviation enthusiasts of all areas join forces to make air travel ready for a sustainable future.
This repository collects useful scripts, projects and analyses.
We would love to hear from you. Leave us a message at https://www.linkedin.com/company/sustainableaviationinitiative.
Have one dashboard to summarizes SAI's reach across platforms. KPI is Total unique visitors per week.
The data analysis is live on https://share.streamlit.io/philippschmalen/sai-data-science .
./ Project root
├───data
│ ├───interim
│ ├───processed
│ └───raw
├───env # contains conda environment.yml to create conda env
├───notebooks
├───references
│ └───images # used for readme
└───src
├───app # contains streamlit app dashboard.py
│ └───util # utility functions
└───data # data processing script
The following assumes the python package manager Anaconda/Minconda. First,
-
setup your conda virtual environment to be able to run the code
- open the anaconda cmd in your project folder
- navigate to
./env
and run conda env create -f environment.yml
- Ensure that your python builder points to the
python.exe
from the environment we just created. For example, I use the project settings in Sublime to point to thepython.exe
of thesai
virtual environment. See below for my exemplary project settings.
{
"build_systems":
[
{
"name": "Anaconda Python Builder",
"selector": "source.python",
"shell_cmd": "C:/Users/phili/Miniconda3/envs/sai/python.exe -u \"$file\""
}
],
"folders":
[
{
"name": "SAI social media analysis",
"path": "[MY PROJECT PATH]"
}
]
}
- ensure that data files exists for Anchor and LinkedIn in
./data/raw/
- You find the data processing script in
./src/data/process_data.py
. Change the data directories according to your OS and needs.
Refer to Streamlit's official get started guide.
Follow the above steps to install environment or run
pip install streamlit pandas plotly
Go to ./src/app
, open an anaconda prompt conda activate sai
and
streamlit run streamlit_app.py
A browser should open and show
A checklist for deployment on Streamlit Sharing:
- Ensure a public repository
- A repository with submodules or subdirectories will not work
- Put
requirements.txt
into the repository root folder./
with a minimum amount of packages, for examplepandas==1.2.3 pyyaml==5.4.1 streamlit==0.79.0 plotly==4.14.1
Common mistakes are incorrect filenames, such as requirement.txt
or requirements.txt.txt
instead of requirements.txt
.
4. Any files called from within ./src/app/streamlit_app.py
have to be called relative to the root folder of the repo, ./
. Relative paths from like navigating one level up like ../
within streamlit_app.py
will not work. Reference files from the repository root ./
, for example:
python # DOES'T WORK: call settings.yml with relative path one level up with open('../settings.yml') as file: [...] # WORK: call settings.yml from repo root dir with open('./src/settings.yml') as file:
To add new data points after a certain time, proceed as follows. Important: Store the data in the directory, where ./src/data/process_data.py
expects it to be. This is ./data/raw
by default.
You find the analytics data as follows:
More
> analytics
> Tweets
Select last 28 days and Export data by day to the project data folder ./data/raw
.
Explore ways to merge social media data across platforms. Make daily follower statistics from linkedin
and anchorfm
(manually) accessible. Focus on the total sum, such as Total page views or Total plays.
-
load files from directory, where the raw data lives (e.g.
../../data/raw
) -
create a common
date
column with datetime format UTC -
drop date duplicates
-
transform into long format which yields:
date platform value 0 2020-10-08 00:00:00+00:00 anchor 0.0 1 2020-10-09 00:00:00+00:00 anchor 0.0 2 2020-10-10 00:00:00+00:00 anchor 0.0 -
export as csv, e.g. to
../../data/processed