Skip to content

Arrange SKOS Wikipedia Categories Dataset into plain text files.

License

Notifications You must be signed in to change notification settings

HY-UDBMS/skos_categories_taxonomy_miner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

skos_categories_taxonomy_miner

Arrange SKOS Wikipedia Categories Dataset into plain text files.

##Requirements

##Input

##Output

  • wiki_cat.txt: List of category ids with texts (tab-splitted).
  • wiki_relation.txt: List of parent-child relations between categories (tab-splitted). (Note: The relations are graphs, not trees!)

##Sample Output

  • wiki_cat.txt (1,372,446 lines):
1	Futurama
2	World War II
3	Programming languages
4	Professional wrestling
5	Algebra
6	Anime
7	Abstract algebra
8	Mathematics
9	Linear algebra
10	Calculus
11	British monarchs
12	Monarchs
13	Star Trek
14	People
15	Desserts
...	...
  • wiki_relation.txt (2,836,084 lines):
1	4044
1	25036
1	250051
1	806081
2	2659
2	28202
2	40342
2	91655
2	126070
2	142632
2	215635
2	292083
2	293028
2	293214
2	298882
...	...

License

Licnese of the source codes in this repository can be found in file LICENSE.md.

The SKOS Wikipedia Categories Dataset is licensed under Creative Commons Attribution-ShareAlike 3.0 License and the GNU Free Documentation License. See http://wiki.dbpedia.org/about.

The copyright notices of Wikipedia can be found at https://en.wikipedia.org/wiki/Wikipedia:Copyrights.

About

Arrange SKOS Wikipedia Categories Dataset into plain text files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages