Skip to content

Razwand/scraping_data_criminalia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕵 Criminalia Scraping with BeautifulSoup 🥣

About

This tool scraps data from the website https://criminalia.es/ in order to obtain:

  • Image data of the scrapped profiles
  • Text data of the scrapped profiles

Flow

Requirements

  • A suitable conda environment named scrap can be created and activated with:
conda env create -f environment_scrap.yml
conda activate scrap

👤 Scraping Images

This module scraps images from all the listed profiles. Profile Images and Data can be downloaded choosing between man/woman profiles and the number of profiles to be processed.

How to

In the following scenario, the user is searching for 37 men, storing results in a folder named ./output_image/. If the number of profiles to be processed exceeds the total number of profiles, a message will show a message. The maximum number of available profiles will be returned in this case.

scraping_criminalia>python scrap_web.py
>>Gender (M/W): M
>>Number of profiles to scrap: 37
>>MODE (IMG/TEXT): IMG

Result_1

🖋 Scraping Text

Scraps data from the profiles. This data can be searched with filter man/woman and the number of profiles to be processed. This data is returned as a .csv in ./output_text/ folder with the following variables:

Feature Values
Class Murder, Serial Killer, Homicide, etc.
Subclass Parricide, etc.
Sentence Death penalty, years of prison, etc. *
Location State/Country
Victims Number of victims
Date Date of the crime
Detention Date of the detention
Victim Profile Male/Female, age and other details *
- To be processed (More fields could be obtained)

How to

In the following scenario, the user is searching for 5 men. If the number of profiles to be processed exceeds the total number of profiles, a message will show a message. The maximum number of available profiles will be returned in this case.

scraping_criminalia>python scrap_web.py
>>Gender (M/W): M
>>Number of profiles to scrap: 5
>>MODE (IMG/TEXT): TEXT

Result_2

About

Playing to get some data through web scraping🕵

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages