Skip to content

ardian/ASPxtraktor

Repository files navigation

Dependencies that must be installed to run the software 
	     ==== Ubuntu =====
sudo apt-get install libtry-tiny-perl
sudo apt-get install libwww-mechanize-perl
sudo apt-get install libyaml-perl
sudo apt-get install libhtml-treebuilder-xpath-perl
sudo apt-get install libdbix-class-schema-loader-perl
sudo apt-get install libcompress-bzip2-perl

Test it like this :
     perl    -I lib/ bin/aspxtraktor.pl --term "software"
That only processes index pages and saves them but does not download the details.

Detail Pages are processed like this :
perl    -I lib/ bin/aspxtraktor.pl --term "softwa" --recurse


Read in a file into the database like this :
perl    -I lib/ bin/aspxtraktor.pl --file=output_test/DataExtractor_IPKO_P1_Data4.htm.bz2  

if you want to load the business activity types : add the --loadtype arguement   
 perl   -I lib/ bin/aspxtraktor.pl --loadtype --file=output_test/DataExtractor_IPKO_P1_Data4.htm.bz2  

About

Scraper in Perl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published