Data Archeology & Pipeline Ingestion of Company Data for KPMG Signal Repository

A Data ingestion pipeline for company centric data from multiple sources into KPMG signals repository

6482 Companies base information available

Company base Data that includes general information about the company, market cap, description, etc. (see SR Ingestion Metadata)

Granular data available for S&P 500 companies

clone the repository
```
git clone <repor>
```
cd projectname/
Create a virtual environment
```
python3.8 -m venv env
```
Activate the virtual environment
```
source env/bin/activate
```
Install the requirements
```
pip install -r requirements.txt
```
Run the base data scrapper
```
python base_data.py
```
NOTE: The base data scrapper is saved in tmp/international_company_data, you will have to make a copy of data already available to avoid apending duplicates.
Run the recurring data scrapper
```
python recurring_data.py
```
NOTE: The recurring data scrapper can be run on a daily basis to fetch new information like news, and trading data.

Author

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
models		models
schema		schema
source_metadata/robots_txt		source_metadata/robots_txt
tmp		tmp
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
base_data.py		base_data.py
base_data_local.py		base_data_local.py
configurations.yaml		configurations.yaml
proxies.yaml		proxies.yaml
recurring_data.py		recurring_data.py
requirements.txt		requirements.txt