Skip to content

Dataset of over 75 000 polish Wikipedia pages (assigned to specific science fields) and links between these pages.

Notifications You must be signed in to change notification settings

kornelro/polish-wikipedia-graph-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

polish-wikipedia-graph-dataset

The repository contains dataset of over 75 000 polish Wikipedia pages assigned to specific science fields and links between these pages. Dataset can be use as simple classification task in NLP, especially as benchmark for graph based methods.

wiki_pages.csv

Articles information file. Columns:

  • title - article title,
  • text - article text,
  • category - one of 7 main Wikipiedia categories related with science fields that was the closest to article categories in scrapped categories tree.

Articles categories:

  • Astronomia - astronomy,
  • Biologia - biology,
  • Matematyka - math,
  • Psychologia - psychology,
  • Fizyka - physics,
  • Informatyka - computer science,
  • Chemia - chemistry.

annotations.csv

File with links between pages. First column is source article title and second column is target article title. Take a note that file includes links to pages that are not present in wiki_pages.csv.

About

Dataset of over 75 000 polish Wikipedia pages (assigned to specific science fields) and links between these pages.

Resources

Stars

Watchers

Forks