Skip to content

themains/dmoz_csv

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DMOZ CSV

DMOZ1 is a large communally maintained open directory that categorizes web content. The data are posted in a XML format.

The python scripts provided here convert DMOZ content.rdf.u8.gz into a CSV file.

The structure of the file is

"URL","Category 1","Category 2",..........

Example:

http://www.demus.it/

is in

DMOZ Categories (1-4 of 4) Business: Food and Related Products: Beverages: Coffee (1) Regional: Europe: Italy: Regions: Friuli-Venezia Giulia: Localities: Trieste: Business and Economy (1) World: Italiano: Affari: Alimentazione e Prodotti Correlati: Bevande: Caffè (1) World: Italiano: Regionale: Europa: Italia: Friuli-Venezia Giulia: Provincia di Trieste: Località: Trieste: Affari e Economia (1)

The corresponding line for it will be generated as:

"http://www.demus.it/","Top/Regional/Europe/Italy/Friuli-Venezia_Giulia/Localities/Trieste/Business_and_Economy","Top/World/Italiano/Affari/Alimentazione_e_Prodotti_Correlati/Bevande/Caffè","Top/World/Italiano/Regionale/Europa/Italia/Friuli-Venezia_Giulia/Provincia_di_Trieste/Località/Trieste/Affari_e_Economia","Top/Business/Food_and_Related_Products/Beverages/Coffee"

1: Dmoz.org was discontinued on March 17, 2017. The content as moved to http://dmoztools.net And now hosted on https://curlie.org/

About

Convert DMOZ content.rdf.u8.gz into a CSV file

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%