Skip to content

[Proyecto Personal] Repositorio que obtiene más de mil noticias de periódicos colombianos y genera dicha información en un .csv, xml o json.

License

Notifications You must be signed in to change notification settings

Alfareiza/scrapping-colombian-newspapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapping Colombian Newspapers

:shipit: Scrapping information of Colombian newspapers.

Getting Started

Repository that captures information from Colombian newspapers (list of newspapers), then generates a csv with all the information.

Prerequisites

Pip and Python installed and configured as environment variables. Then, install pipenv.

Create a folder and inside it execute the next commands:

To isolate the environment:

python -m venv .venv

To activate the enviroment in Windows:

\Scripts\activate.bat

Installing the dependencies:

pip install -r requirements.txt

Run the code:

python scrap/main.py

Will be generated a csv or json or xml all_news.(csv|json|xml) with all the information captured at the moment of the execution of the line above.

At the moment, the list of scrapped sites is [20]:

  • elheraldo
  • zonacero
  • elpilon
  • eluniversal
  • diariodelcesar
  • hoydiariodelmagdalena
  • diariodelnorte
  • laopinion
  • eltiempo
  • elcolombiano
  • elespectador
  • lapatria
  • elpais
  • elmundo
  • elnuevodia
  • elmanduco
  • semana
  • publimetro
  • pulzo
  • larepublica

About

[Proyecto Personal] Repositorio que obtiene más de mil noticias de periódicos colombianos y genera dicha información en un .csv, xml o json.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages