- VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.
- http://www.visualqa.org/index.html
- Code for the real-time chat interface used to collect the VisDial dataset on Amazon Mechanical Turk
- https://visualdialog.org/data
- ニコニコ静画(イラスト)のリサイズ済みデータセットニコニコ静画(イラスト)のデータセット / ニコニコ静画で投稿されたイラストのタグを学習した、Chainer用の学習済みモデルファイル / ニコニコ動画コメント等データセットおよびニコニコ大百科データ
- https://nico-opendata.jp/ja/index.html
- rating data sets from the MovieLens web site (http://movielens.org).
- http://grouplens.org/datasets/movielens/
- YouPorn dataset. It contains Video IDs, Categories/Tags, Avg. Rating, Ratings Count, Full Video Title, Nicknames and Comments
- http://blog.uni-mannheim.de/mschuhma/yp-corpus/
- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
- This in an exhaustive dataset of metadatas of all videos published on the site from its creation - 2007 - until february 2013. This represents almost 800,000 entries.
- http://sexualitics.github.io/
- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
- This in an exhaustive dataset of metadatas of all videos published on the site from its creation - 2007 - until february 2013. This represents almost 800,000 entries.
- http://sexualitics.github.io/
- Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
- This page contains many classification, regression, multi-label and string data sets stored in LIBSVM format. Many are from UCI, Statlog, StatLib and other collections.
- http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
- 食材の重さをクラウドソーシングにより収集し,公開しています.レシピなどでよく,砂糖10グラムなどと書かれていますが,さんま100グラムなどはあまり書かれておらず,一匹や一切れという単位で書かれています.ところが,栄養などを測定する場合には,食材のグラムあたりの栄養価が書かれていることが多いです.そこで,レシピに書かれている単位とその重さの対応表を作成しました
- http://bigdata.naist.jp/~ysuzuki/data/food/
- 全国の町丁目レベル(189,540件)の住所データをオープンデータとして公開いたします。本データは、国土交通省位置参照情報ダウンロードサービスで配布されている「大字・町丁目レベル位置参照情報」をベースとしていますが、「大字・町丁目レベル位置参照情報」データは年に一回更新であるのに対して、本リポジトリで配布するデータは毎月更新しています。
- https://github.com/geolonia/japanese-addresses
- CC BY 4.0
- There are a few differentiating features of Project CodeNet when compared to other similar efforts. In addition to the size of the dataset, the code samples are written in over 50 programming languages, though the dominant languages are C++, C, Python, and Java. The code samples in Project CodeNet are annotated with a rich set of information, such as the code size, memory footprint, CPU run time, and status, which indicates acceptance or error types. Over 90% of the problems come with the respective problem description, which contains a concise problem statement, specification of the input format and the output format. When available, we also extracted from the problem description sample input and output, and provide them as part of the dataset. Users can execute the accepted codes samples (over 50% of the submissions are accepted) to extract additional metadata and verify outputs from generative AI models for correctness.
- https://github.com/IBM/Project_CodeNet
- Apache License 2.0