Skip to content

A utility that scrapes lists of unscrupulous entities (barred from doing financial business) published by various legal institutions

License

Notifications You must be signed in to change notification settings

themousepotato/unscrapulous

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unscrapulous - scrape unscrupulous entities

forthebadge made-with-python forthebadge

artwork

Artwork by @xypnox

Motivation

Various regulatory bodies publish list of people/entities who have violated laws or regulations. The primary identifier for these records are their PANs. Banks, brokers are supposed to not provide services to these entities, identified by their PANs. Unscrapulous is a python utility which has scrapers to create a huge database of such people/entities.

The PostgreSQL database unscrupulous_entities contains the following fields:

1. ID
2. PAN
3. Name
4. AddedDate (day of blacklisting according to the source)
5. Source
6. Meta (a JSON encoded field of whatever fields each source provides)

Installation

$ sudo apt-get install postgresql postgresql-contrib libpq-dev
$ pip install unscrapulous

Usage

$ unscrapulous --config=config.toml

Development

$ git clone git@github.com:themousepotato/unscrapulous.git
$ cd unscrapulous
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
$ poetry install
$ poetry build
$ pip install dist/unscrapulous-*.whl

Config

The config.toml file has the following format:

[postgresql_conn]
host = 'localhost'
dbname = 'postgres'
user = 'postgres'
password = 'password'

[scrapers]
arbitration_awards_bse = false
arbitration_awards_nse = false
bse_defaulter_and_expelled_members = false
icex_defaulter_members = false
icex_expelled_members = false
income_tax_defaulters = false
irda_blacklisted = false
mca_company_defaulter_list = false
mca_director_defaulter_list = false
mca_director_disqualified_list = false
mca_proclaimed_offenders_ind = false
mcx_action_ap = false
mcx_defaulter_members = false
mcx_secretaries_defaulter_list = false
mse_arbitral_awards = false
ncdex_suspended_defaulted_expelled_debarred_members = false
nse_defaulted_members = false
nse_expelled_members = false
nse_regulatory_defaulting_clients = false
sebi_debarred_bse = true
sebi_debarred_nse = true
sfio_convicted = false
sfio_proclaimed_offenders = false
unsc_1988 = false
unsc_consolidated_list = false
wildlife_crime_convicts = false

Popular users

Organization Description
Zerodha Online platform to invest in stocks, derivatives, mutual funds, etc.

Roadmap

Source Category Status
ACE Suspended Members A Couldn't find source
Arbitration Awards - BSE A
  • Scrape from source
  • Generate fields for global csv
Arbitration Awards - NSE A
  • Scrape from source
  • Generate fields for global csv
BSE Defaulter and Expelled Members A
  • Scrape from source
  • Generate fields for global csv
BSE Regulatory Defaulting Clients A Couldn't find source
ICEX Defaulter Members A
  • Complete
ICEX Expelled Members A
  • Complete
MCX Action AP A
  • Complete
MCX Defaulter Members A
  • Complete
MSE Arbitral Awards A
  • Scrape from source
  • Generate fields for global csv
MSE Trading Clearing Members A Couldn't find source
NCDEX Suspended Defaulted Expelled Debarred Members A
  • Scrape from source
  • Generate fields for global csv
NMCE Defaulted Members A Couldn't find source
NMCE Expelled Members A Couldn't find source
NMCE Suspended Members A Couldn't find source
NSE Defaulted Members A
  • Complete
NSE Expelled Members A
  • Complete
NSE Regulatory Defaulting Clients A
  • Complete
UAPA A
  • Scrape from source
  • Generate fields for global csv
UNSC_1267/UNSC Consolidated List A
  • Complete
UNSC_1988 A
  • Complete
UNSC_2140 A Couldn't find source
UNSC_2270 A Couldn't find source
SEBI Debarred - BSE A
  • Complete
SEBI Debarred - NSE A
  • Complete
MCA Proclaimed Offenders (Ind) B
  • Scrape from source
  • Generate fields for global csv
SFIO Convicted B
  • Complete
SFIO Proclaimed Offenders B
  • Complete
RBI Suit File B
  • Scrape from source
  • Generate fields for global csv
IRDA Blacklisted B
  • Complete
Income Tax Defaulters B
  • Complete
Wildlife Crime Convicts B
  • Complete
MCA Director Defaulter List B
  • Complete
MCA Director Disqualified List B
  • Scrape from source
  • Generate fields for global csv
MCX Secretaries Defaulter List B
  • Complete
NCLT (IBBI) B
  • Scrape from source
  • Generate fields for global csv
MCA Company Defaulters List B
  • Complete
European Union Sanctions B Couldn't find source
MCA Companies Struck Off list B
  • Scrape from source
  • Generate fields for global csv
Interpol B Couldn't find source
United Kingdom Sanction List B
  • ods/odt -> csv
  • Generate fields for global csv
OFAC B Couldn't find source
Local PEP - Only India PEP B Couldn't find source

About

A utility that scrapes lists of unscrupulous entities (barred from doing financial business) published by various legal institutions

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages