Data Repository for PyGOD

The statistics of the available dataset (#Con. means the number of contextual outliers, while #Strct. means the number of structural outliers. The number of outliers is slightly less than the sum of two types of outliers because of the intersection between two types of outliers.):

Dataset	Type	#Nodes	#Edges	#Feat	Avg. Degree	#Con.	#Strct.	#Outliers	Outlier Ratio
'weibo'	organic	8,405	407,963	400	48.5	-	-	868	10.3%
'reddit'	organic	10,984	168,016	64	15.3	-	-	366	3.3%
'disney'	organic	124	335	28	2.7	-	-	6	4.8%
'books'	organic	1,418	3,695	21	2.6	-	-	28	2.0%
'enron'	organic	13,533	176,987	18	13.1	-	-	5	0.04%
'inj_cora'	injected	2,708	11,060	1,433	4.1	70	70	138	5.1%
'inj_amazon'	injected	13,752	515,042	767	37.2	350	350	694	5.0%
'inj_flickr'	injected	89,250	933,804	500	10.5	2,240	2,240	4,414	4.9%
'gen_time'	generated	1,000	5,746	64	5.7	100	100	189	18.9%
'gen_100'	generated	100	618	64	6.2	10	10	18	18.0%
'gen_500'	generated	500	2,662	64	5.3	10	10	20	4.0%
'gen_1000'	generated	1,000	4,936	64	4.9	10	10	20	2.0%
'gen_5000'	generated	5,000	24,938	64	5.0	10	10	20	0.4%
'gen_10000'	generated	10,000	49,614	64	5.0	10	10	20	0.2%

To use the datasets:

from pygod.utils import load_data
data = load_data('weibo') # in PyG format

Alternative download source in Baidu Disk (Chinese): https://pan.baidu.com/s/1afEZaygCRUYWJPtVbzuRYw Access Code: bond

For injected/generated datasets, the labels meanings are as follows.

0: inlier
1: contextual outlier only
2: structural outlier only
3: both contextual outlier and structural outlier

Examples to convert the labels are as follows:

y = data.y.bool()    # binary labels (inlier/outlier)
yc = data.y >> 0 & 1 # contextual outliers
ys = data.y >> 1 & 1 # structural outliers

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
books.pt.zip		books.pt.zip
disney.pt.zip		disney.pt.zip
enron.pt.zip		enron.pt.zip
gen_100.pt.zip		gen_100.pt.zip
gen_1000.pt.zip		gen_1000.pt.zip
gen_10000.pt.zip		gen_10000.pt.zip
gen_500.pt.zip		gen_500.pt.zip
gen_5000.pt.zip		gen_5000.pt.zip
gen_time.pt.zip		gen_time.pt.zip
inj_amazon.pt.zip		inj_amazon.pt.zip
inj_cora.pt.zip		inj_cora.pt.zip
inj_flickr.pt.zip		inj_flickr.pt.zip
reddit.pt.zip		reddit.pt.zip
weibo.pt.zip		weibo.pt.zip

License

pygod-team/data

Folders and files

Latest commit

History

Repository files navigation

Data Repository for PyGOD

About

Resources

License

Stars

Watchers

Forks