Skip to content

This repository is based on scraping data from a static website through Luigi. This was created to display my ability to utilize the Luigi pipeline to automatically collect data and other tasks.

License

micgonzalez/Luigi-Data-Pipeline-with-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Luigi Data Pipeline with Python

Introduction

This repository is based on scraping data from a static website through Luigi. This was created to display my ability to utilize the Luigi pipeline to automatically collect data and other tasks.

Abstract

Have you ever asked yourself, if it was possible to automate tasks? Could word counts help in finding insights? What could you do with the extra time gained from automation? These were a few questions that came to mind, when working on this project.

Summary of Skills

I used the python environment within Pycharm and Luigi Scheduler to perform my actions needed to complete this repository. I also used the Luigi, Beautifulsoup4, Requests, Counter and Pickle packages for this project.

Preview

Preview of Luigi Scheduler created from this project.

This Screenshot was created from this project to show the public about the visual view of Luigi Scheduler.

Preview of Pycharm Terminal created from this project.

This Screenshot was created from this project to show the public what was done in Pycharm's terminal window.

Findings

I was tasked to automate the task of scraping data from a static webpage and create a summary of the resulting word count. I had to utilize the Luigi Scheduler and Pycharm's terminal function to perform these tasks. Using Luigi Scheduler can reduce the allotted time in performing these tasks. Luigi is a powerful application, but it does not have a visual display of the tasks being performed. Luigi Scheduler helps in giving a useful display of visuals on the tasks.

Challenges

On this project, I had one challenge that I did not foresee when I was working on this project. The challenge for me was to navigate through Luigi Scheduler. I have intereacted with Luigi through Pycharm's Terminal function and the Terminal app on a MAC. In my previous class, we never interacted with a visual display of Luigi. Luigi Scheduler has an interest layout and it took some time to get use to it.

Conclusion

Thinking back to my previous experience with Luigi, I did not know that there was a visual way to see how the tasks are preformed. Utilizing Luigi with Luigi Scheduler will help in automating tasks like web scraping and summarizing word counts. This just not cut down the time on performing these tasks, but it allows you to focus on more complex tasks. Luigi is great for straightforward projects.

About

This repository is based on scraping data from a static website through Luigi. This was created to display my ability to utilize the Luigi pipeline to automatically collect data and other tasks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages