Other dataset

Vision x NLP

VQA

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.
http://www.visualqa.org/index.html

VisDial Dataset

Code for the real-time chat interface used to collect the VisDial dataset on Amazon Mechanical Turk
https://visualdialog.org/data

Website data

nico-opendata

ニコニコ静画（イラスト）のリサイズ済みデータセットニコニコ静画（イラスト）のデータセット / ニコニコ静画で投稿されたイラストのタグを学習した、Chainer用の学習済みモデルファイル / ニコニコ動画コメント等データセットおよびニコニコ大百科データ
https://nico-opendata.jp/ja/index.html

MovieLens

rating data sets from the MovieLens web site (http://movielens.org).
http://grouplens.org/datasets/movielens/

The YP Corpus (The YP Dataset)

YouPorn dataset. It contains Video IDs, Categories/Tags, Avg. Rating, Ratings Count, Full Video Title, Nicknames and Comments
http://blog.uni-mannheim.de/mschuhma/yp-corpus/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

xHamster

This in an exhaustive dataset of metadatas of all videos published on the site from its creation - 2007 - until february 2013. This represents almost 800,000 entries.
http://sexualitics.github.io/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Xnxx

This in an exhaustive dataset of metadatas of all videos published on the site from its creation - 2007 - until february 2013. This represents almost 800,000 entries.
http://sexualitics.github.io/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

MISC

LIBSVM Data: Classification, Regression, and Multi-label

This page contains many classification, regression, multi-label and string data sets stored in LIBSVM format. Many are from UCI, Statlog, StatLib and other collections.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

食材の重さデータセット

食材の重さをクラウドソーシングにより収集し，公開しています．レシピなどでよく，砂糖10グラムなどと書かれていますが，さんま100グラムなどはあまり書かれておらず，一匹や一切れという単位で書かれています．ところが，栄養などを測定する場合には，食材のグラムあたりの栄養価が書かれていることが多いです．そこで，レシピに書かれている単位とその重さの対応表を作成しました
http://bigdata.naist.jp/~ysuzuki/data/food/

Geolonia 住所データ

全国の町丁目レベル（189,540件）の住所データをオープンデータとして公開いたします。本データは、国土交通省位置参照情報ダウンロードサービスで配布されている「大字・町丁目レベル位置参照情報」をベースとしていますが、「大字・町丁目レベル位置参照情報」データは年に一回更新であるのに対して、本リポジトリで配布するデータは毎月更新しています。
https://github.com/geolonia/japanese-addresses
CC BY 4.0

Project CodeNet

There are a few differentiating features of Project CodeNet when compared to other similar efforts. In addition to the size of the dataset, the code samples are written in over 50 programming languages, though the dominant languages are C++, C, Python, and Java. The code samples in Project CodeNet are annotated with a rich set of information, such as the code size, memory footprint, CPU run time, and status, which indicates acceptance or error types. Over 90% of the problems come with the respective problem description, which contains a concise problem statement, specification of the input format and the output format. When available, we also extracted from the problem description sample input and output, and provide them as part of the dataset. Users can execute the accepted codes samples (over 50% of the submissions are accepted) to extract additional metadata and verify outputs from generative AI models for correctness.
https://github.com/IBM/Project_CodeNet
Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset.md

dataset.md

Other dataset

Vision x NLP

VQA

VisDial Dataset

Website data

nico-opendata

MovieLens

The YP Corpus (The YP Dataset)

xHamster

Xnxx

MISC

LIBSVM Data: Classification, Regression, and Multi-label

食材の重さデータセット

Geolonia 住所データ

Project CodeNet

Files

dataset.md

Latest commit

History

dataset.md

File metadata and controls

Other dataset

Vision x NLP

VQA

VisDial Dataset

Website data

nico-opendata

MovieLens

The YP Corpus (The YP Dataset)

xHamster

Xnxx

MISC

LIBSVM Data: Classification, Regression, and Multi-label

食材の重さデータセット

Geolonia 住所データ

Project CodeNet