This project includes data crawling, data processing, and data loading to S3. The data is crawled from a public apartment trading website and then it is cleaned, formatted and the features are extracted from the data. Finally, the data form different cities are saved in parquet format locally and also in S3.
.
├── data
│ ├── raw
│ ├── clean
│ └── fea
├── config.ini # setting file
├── Dockerfile
├── prepare_data.py # main function
├── run.sh
├── requirements.txt
└── README.md
- AWS account
- Docker
$ docker build -t <Container_Name> .
$ docker run -it <Container_Name> /bin/bash
$ docker run \
-e AWS_ACCESS_KEY_ID=<access_key> \
-e AWS_SECRET_ACCESS_KEY=<access_secret> \
-e mfa_serial=<mfa_serial> \
-e region=eu-west-1 \
-it data_clean\
/bin/bash